I think we need to think about why we would want to use dm-integrity
in the first place. You didn’t mention your reasons in your original post, and neither did I in my reply. I just replied because I was researching this topic, and found your post.
I think, if you are just trying to achieve data integrity on a single disk, or without using Raid, you might be better off using ZFS or Btrfs. I’m not sure in what situations would dm-integrity
be better in this case.
As for my reasons, I was looking for a way to have redundancy and integrity and, at the same time, to have the ability to expand and grow the Raid array. I.e. adding disks to and expanding a Raid-5 or Raid-6 array. This is not possible in ZFS, because you can’t expand vdevs. You can’t improve disk space utilisation once the vdev has been created. (I haven’t looked at Btrfs Raid-5 or 6, because that’s unstable according to the official documentation.)
By using md-raid
on top of dm-integrity
, I could have everything (integrity + redundancy + the ability to expand the array). Sounds perfect. The catch is that it’s going to be slow. So, I was looking at how to make it faster, and came across the idea of disabling journaling. Also note that I’m mainly interested in HDD storage, not SSDs. This is the story in short.
But in the meantime, I have also discovered that’s not a good idea to disable journaling. It’s unsafe. If there is a power outage, and dm-integrity
writes out corrupted blocks, that’ll happen on all disks, because Raid writes to all disks simultaneously. So, Raid won’t be able to recover.
One other option I can think of to speed things up is to use dm-cache
or dm-writecache
. Like this: disk → dm-integrity → md-raid → dm-cache.
Another option would be to use dm-integrity
with the --data-device option
, and write the journal to a separate disk. Perhaps to SSDs. But I have not found any info or tutorials on it. What are the space requirements? How does this setup handle a power-loss? Etc. Also, I can’t imagine using this approach with a multi-disk Raid array. That could get out of hand really quickly if the number of disks grows, since each journal would need a different device, I suppose.
As for “protection information” (T10-PI), I haven’t looked at it in detail, but based on a quick search, it doesn’t feel like it’s actively used today. (I might be wrong!) First, it’s based on 512 or 520 block sizes, and it seems like the general direction nowadays is 4k block sizes. 512 is legacy… Second, you need an SAS controller that supports it…
But after reading your comment, as I was looking it up, I discovered some very interesting things:
- SAS controllers have data integrity features built in on the controller level (or maybe driver level? not sure), while SATA controllers are more prone to communication errors. The difference is significant. A good article: Google for “Davide Piumetti Data integrity in the Enterprise environment”.
- The price difference between SATA and SAS HDDs is not that big as I thought. The situation with SSDs is different though.
- Some HDDs have persistent cache technology to prevent data-loss in a power-loss situation.
Well, some more rabbit holes to explore, I guess… Which I’m not going to do now. But I’ll add a few more notes that come to mind.
SAS controllers might be the reason why simple Raid without any fancy dm-integrity, ZFS or Btrfs works well in enterprise environments. Or why a simple journaling filesystem (ext4 or XFS) works well, even without extra checksums. It’s because SAS controllers provide data integrity.
I also want to add that I don’t believe in ‘bitrot’ (data corruption at rest). I think it’s a myth. Disks have ECC written on the platter after each block, which they verify on each read. Based on the articles and comments I’ve read, my conclusion is that ‘silent’ data corruption is almost always the result of
- A) a bad cable connection
- B) a faulty controller
- C) bad, non-ECC system memory, or
- D) the wrong data written out in the first place, e.g. due to buggy software.
Now, it’s true that even if bitrot is a myth, we still need protection against the other causes of corruption (points A. and B. from above). But I wonder whether using software integrity checks (be it dm-integrity
, ZFS or Btrfs) has any added benefits if we have a SAS controller. It may be totally unnecessary. And regarding points C. and D., nothing will protect against those, not even software integrity checks, or a SAS controller.
Maybe dm-integrity
’s main benefit (with journaling) is not preventing silent data corruption, at least with SAS drives, but protecting against a crash or power-loss? Journaling filesystems are quite reliable on their own. And Raid is quite safe against URE (Unrecoverable Read Errors), or disk failures. But writes are still not atomic on the block level, which is a risk. That’s where dm-integrity
with journaling comes in. In other words: dm-integrity
without journaling on SAS drives makes no sense to me, because integrity in itself is not an issue with SAS.
It feels like dm-integrity
without journaling makes sense only for SATA drives. It would add data integrity. (But still without protection against a power-loss.)
If a drive itself has protection against power-loss events with persistent cache, would disabling journaling for dm-integrity
be safe? Apart from a power-loss, in what other situations would dm-integrity
without journaling write out corrupted blocks?
To even further complicate things, there is also --integrity-bitmap-mode
in dm-integrity
. This is another thing I don’t get! How is this better than no journaling at all? What is the intended purpose of this feature? What’s the use-case? Yes, we get data integrity, but we get that without journaling or bitmap mode too if there is no power-loss. And if there is a power-loss, bitmap mode can still corrupt the data. So it’s not any better than not having it all. I don’t understand in what situations, and how it is useful.
If anyone has further thoughts, advice or opinion on any of the above, please comment!
Further reading I found useful:
(Google for these, because apparently I can’t include links in my post.)
-
“Battle testing ZFS, Btrfs and mdadm+dm-integrity”
-
“github khimaros raid-explorations”
-
The comment section here: “github MawKKe cryptsetup-with-luks2-and-integrity-demo.sh”