So I see a lot of misinformation and misunderstanding around RAID, “softraid” filesystems like zfs and btrfs. I am no great authority figure – my largest arrays are probably only about 30 spindles – but we ourselves have been accused of not understanding either, but I don’t believe this to really be the case. So I have been experimenting in the background the last few months.
It has been tedious and laborious.
This type of basic test is what I’ve come up with, and the results are getting interesting. I could probably turn it into a video, but these types of things are unlikely to get any traffic. But it’s kind of interesting for my own learning.
Scientific method requires documenting one’s steps, and a reproducible result. So I would begin again, going even slower, to document the experiments.
So I thought I’d outline what I plan to do, in order to see if anyone spots any trouble with my experiments. I want the simplest experiments that provide the most concrete direct results.
What’s being tested?
For me, the priorities of a RAID system are:
1. Don’t allow the data to experience bitrot and become corrupt. The data you put, should be the data you get, even if years later, even after hardware failure.
2. Replacing a failed drive should work, and not corrupt data.
3. Corrupt data, ahead of outright failure, should be detected, reported, and corrected.
4. Perform reasonably for the # spindles involved.
Hardware Permutations:
For each setup, we will use 5 750gb Seagate Drives. Why this? It’s what I have on hand. In the following configurations:
- Raidz1 on FreeNAS
- LSI hardware raid controller w/BBU (raid 5)
- BTRFS (raid 5 metadata and data)
- md raid5 on Linux w/ext4 filesystem.
The Tests:
- Pull the plug – this test, a copy to the drive is in progress, and power is pulled. What happens?
- Pull the ram – a DIMM is removed while the system is on. (? May not do this one) What happens?
- Pull a drive – a drive is removed while the system is on. What happens?
- A Drive has gone “evil” – the systems are shut down, and one disk in the array has random “bad” data sprinkled throughout the drive. Approximately 1 megabyte of 4 kilobyte blocks of zeros is written randomly throughout the evil drive. This hopefully simulates a particular failure mode of a drive where it just can’t read, or is returning corrupt information, but otherwise works.
Aftermath of each test:
A large collection of files is copied to each array. The count of files, filenames, and directory layout on the filesystem is documented off of the array. A database of md5 hashes of all of the files on the array is stored on another system. After each event, an automated script scans the filesystem using ordinary filesystem utilities and compares the md5 sum of the files to the md5 sum stored in the database and flags any mismatch. File not found is also flagged.
After the script is complete, the system logs are scanned manually by an operator for any indication of errors, and noted for our review video.
Some notes so far:
- btrfs does not seem to have a ‘scrub’ command yet like zfs does. Though it “seems like” simply scanning all the files on the file system will cause the filesystem to repair corrupt info.
- The LSI hardware controller and md on linux are hilariously bad at test #4 (for data integrity). I am surprised by this for the hardware controller.
- It doesn’t seem fair to test btrfs raid 5. Btrfs raid 5 barely works, and seems to be a thorny quagmire of not-ready-yet code when you actually experience some sort of failure. Please, someone tell me, that they have experienced a drive failure on btrfs and recovered easily?
- zfs machine sometimes hangs when pulling a live disk.
If there is any commentary, or ideas for adjusting the tests (or doing new tests) without creating ludicrous amounts of work for us, let us know.