Raspberry Pi 3 NAS with RAID

Trooper_ish · July 8, 2021, 10:18pm

Oh, I meant to say, use rust. Or proper flash drives, but not the thumbsticks (for storage).

thanks Risk.

Because, flash chips wear out with use, and proper drives are Much better at moving the writes around to prolong the life. They also deal with the heat in better ways.

Thumb sticks are not designed to be written to a lot, and even just log files/metadata can impair their lifetime.

I personally use the github project Log2Ram, to extend the life of the SD cards in the pi;s running off them

Rust would potentially last longer, at the expense of seek times / random read/writes.

Bearing in mind, the pi3 might not write fast enough to flood a single SSD, as the USB and the ethernet are on the same PCIe controller / lane, and IIRC, one interferes with the other, so the limits of Rust has much less effect on a device like a pi3

risk · July 9, 2021, 6:18am

Exactly.

wear leveling - which is what gets you reliability and performance with NAND, is not done on NAND chips themselves on flash sticks, it’s done on a crappy controller. USB flash controllers really suck at that, and they don’t usually come with enough ram to store fine grained FTL maps which means that writes are hugely amplified. They will also never relocate data. Even microsd cards are more reliable these days, as long as you don’t fill them up. For context, with QLC flash (in QLC mode, not SLC emulation), you get between 300 and 1000 PE cycles per cell - typically 300 per FTL block given how there’s no internal error correction with consumer controllers. This means that depending on fullness with a typical desktop app workload they might last between a month and 3 years before they start spewing nonsense - you’d have to monitor the drive to figure this out exactly and do the math based on free space you have.

Typically, the area of the storage most often remapped is the one to have the highest probability of being affected, this is usually the filesystem metadata, or parts of FTL on crappy drives.

Avoid large capacity 2.5" spinning rust as well – it’s mostly SMR and support for SMR zones is experimental in most filesystems today (without it, random write performance sucks, if you ever need to rebuild the raid, or do anything database-ish, it’ll be a sad day). SMR works like FTL with flash drives, block sizes are 256M typically instead of 1M typical of flash, they won’t wear out, especially if helium filled, they’ll just be slow in certain workloads.

Typical “best bang for the buck” 3.5" spinning rust drives these days get you around 250MB/s of sequential reads/writes per drive. With only two of them, you’d be pushing the limits of what the USB 3.0 controller on the Pi can do, and it would be more than Pi can encrypt/decrypt, and you don’t have to worry about any durability/performance degradation stuff - just the physical space and form factor.

Mastic_Warrior · July 9, 2021, 6:58am

I use this for my primary Pi 4 NAS.
https://wiki.radxa.com/Dual_Quad_SATA_HAT

You can run OMV with it as well. I just use mine as a NFS NAS.

ThatGuyB · July 9, 2021, 8:06am

Modify your title from “Pi NAS” to “Pi 3 NAS” to make it more obvious. I would do it, but lost my regular badge (not salty about it, just that this is one of those very rare times I would edit someone else’s post, I usually don’t bother with that).

Azulath · July 9, 2021, 9:42am

Interesting and it works on a Pi3 as well. Currently not listed on Amazon though

@Trooper_ish and @risk: Thanks for the USB thumb drive explanation, it’s appreciated

Mastic_Warrior · July 9, 2021, 10:31am

Yeah, it is sold directly by Radxa. They mostly do stuff for the RockPis.

Azulath · July 10, 2021, 7:24pm

So, I’ve assembled my RPi3 NAS with RAID and against all better judgment I’m actually using thumb drives. Why you might ask? Because I had them at hand and it was easy enough.

With that being said however, I know not to trust this NAS and I’m using it more for educational purposes and not as something to actually store and safe my data. Without the sound advice received here I would have put more trust in it than I should have. So, thanks again

In about 1-1.5 years I will build a real NAS with a RAID array that will be using solid drives and I will know more about my specific needs than I do now

Trooper_ish · July 10, 2021, 7:33pm

It’s your toy, you get to play with it as you like!

Also, if the thumb drives die, it will be experience. (all devices die, just a matter of time)

It is our opinion that they don’t hold up well with sustained activity, but if it;s just a media nas, perhaps not a lot of writes, you can let us know that it does last for years.

I know the 8-bit guy did a raid of thumb drives, and it worked okay for him, for a practice.

risk · July 10, 2021, 7:51pm

Another thing you could do, “for practice and experimentation” and testing, you can make sparse files, and use losetup to make block devices out of files. Or just build a zpool or a btrfs volume out of these fake disks…

… it’s useful if you’re wondering whether a command will work, or whether a certain kind of transformation is supported.

Azulath · July 11, 2021, 7:08pm

Yeah, you guys helped my a lot to manage my expectations and that’s also important.

One NAS I would really like to have is this one:
https://kobol.io/

It’s really awesome…but I might also purchase another one or build one. I will see.

@risk Thanks, I will keep that in mind!

Trooper_ish · July 11, 2021, 7:27pm

the kobol thing looks good, there have been a couple of threads about it:
i.e

Azulath · July 11, 2021, 7:30pm

Thanks for the link

Yeah, it’s definitely neat. In a way, I do like extremes. Either low-power ARM stuff or an Epyc server (just kidding, I wouldn’t want to spend that much)

Azulath · July 13, 2021, 6:54pm

Quick follow up (@Trooper_ish, @risk): While the USB device is not reliable and might just die without further notice, BTRFS should still work as expected, meaning that if there is a read error it will correct it if possible, right?

What happens on BTRFS if it writes garbage? Does it report it somehow or is running sudo btrfs device stats <device> the only way to find issues? (And also the regular scrub…)

ThatGuyB · July 13, 2021, 8:21pm

From what I know about computers, it’s garbage-in, garbage-out. And this is true even if you have ECC. So if a program, network application or a simple corrupted file gets sent to be stored, both ZFS and BTRFS will happily write junk to the storage.

I can’t speak for BTRFS, as I haven’t looked into it, I will speak about ZFS, however, they are similar enough, both use similar techniques. ZFS does a checksum to see if what was written to the array is the same as what was in memory, so it validates and stores the checksum. Both ZFS and BTRFS will detect if data is corrupted when reading from the storage pool. Traditionally, the advantage of parity RAID is that if one drive is faulty or lying and giving bad data, a parity calculation from all the drives (so, basically a “consensus” for lack of a better term) will identify if and what data is bad and also get corrected. Not sure what happens (again, traditionally) in a mirror configuration of only 2 drives, since either one could be lying. You’d basically need a 3-way mirror to correct errors in the data. If 1 drive is lying and the 2 other are correct, then the data gets corrected, otherwise if you got only 2 drives in mirror, or if all 3 drives are reporting something different, a data error will be reported be marked as corrupted, but not repaired. But even if you can’t correct the error, the error is still detected.

ZFS gets around this in both mirror and parity RAID by validating and storing file checksums. If you got a 2-way mirror and 1 drive is lying, it will perform a checksum from the data from both the drives and determine which one coincides with the checksum. So it detects the error and corrects it. But the same applies to parity RAID. I believe BTRFS does similar stuff, but I’m not knowledgeable enough in BTRFS to speak about it.

Trooper_ish · July 13, 2021, 11:17pm

This, kinda. If there is an error, or corruption to the data in memory, before being written, ZFS (and BTRFS) will faithfully save the incorrect data, with a checksum , such that if the data is ever different, it will only store the Correct (junk) copy. It will try and get another copy of the data, if there is a parity/other copy in the pool, else it will just report the file as junk.
This is the reason ECC is lauded.

I was a bit fan of the idea of ECC reducing junk data, as close to source as possible.

But, it really doesn’t seem to be that common. And for just media / personal files, a junk sector from the start is not enough for me to worry about.

If you are saving stuff for posterity, like pics of family or deeds or something, make sure you can open the files after writing, and they look okay.

Jim Salter did a bit about data corruption where a file can be corrupt, but sill open and copy fine, but look visibly corrupt. But as long as it Looks good enough, I would not loose sleep about it.

Once written to disk, corruption can happen later, and that would be detected, and if there are mirrors / raid, it’ll be corrected without any report.

risk · July 15, 2021, 10:27am

The concerns sound covered by the topic of “durability”.
In short, all systems have some probability of data loss and corruption due to various reasons. The toolkit to mitigate some of the risk/probability of corruption and loss is the same regardless of scale. Basically replication and error correction. Things that change are the replication domain. e.g. on a single disk you can keep full multiple copies of data, or you can use zfec. Or you can do multiple disks then you can do mirroring or partial raid56 type setup. Or you can do cluster level stuff where you spread copies of data or error correcting codes over multiple machines and/or multiple racks, and then next step you can do multi cluster stuff within cloud availability zones, or across cloud availability zones or across clouds.

You can also have checksumming on different levels, between CPU and RAM, within ram within disks that usually have an arm chip with some ram for buffering and there’s also checksumming on the platters themselves that you can’t really control and see.

Then you have backups, snapshots, hot, cold, warm, write once, locked hdd regions and tapes and so on - think of it as making more copies of your data across the time dimension.

Ultimately, you have an application some software, and you’re just playing musical chairs moving errors around such that your workflow/use case/application doesn’t stumble into a problem for as long as you can.

In general, your pour more money and effort and get more reliability, and there’s usually hot debates over fanboys of one particular approach that works well for one use case with fanboys of another approach that works well for another use case.

For example, typical cloud hyperscalers lose data at a cluster level irrecoverably and irreconstructibly all the time, but at a very very low rate — and at huge effort cost? (Have you seen market rates for cloud storage it’s ridiculous) is that approach bad? Maybe, some people still pay for it. Just stick data into 2 clusters or clouds if the future of civilization depends on it.

Azulath · July 15, 2021, 8:00pm

Thanks again!

system · April 15, 2022, 2:00pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.