First NAS Build: Boot Drive Help <3

M0rpheus · October 18, 2024, 9:56pm

Hello everyone, hope you are all well!

I require some assistance with setting up my NAS.
First things first, I have sorted out most of it, drives, system, etc.
I will be:

using Debian as I’m familiar with it.
sharing stuff over SMB, so no need for fancier solutions.
configuring a Z-RAID1 for the storage (4 drives).

To the point, I’m seeking some assistance when it comes to boot-drives.
I have 2 major hurdles to overcome:
a) choice of boot drives
b) configuring 2 boot drives in a redundant manner

a) I would like to get 2 small & robust SSDs, ideally SLC, but they are hard to find
(if they are still being made). I’m kind of lost on what to purchase.

b) I’m looking to configure them a la Z-RAID1 where we get the niceties of
checksums & self-healing. But ZFS on boot for Debian doesn’t seem like the
way to go.

To sum it up, I would like some guidance on what boot drives to purchase
and how to setup a good redundancy on them.

Thank you in advance <3

PS: As this is my first NAS build, feel free to hurl any tips my way.

PS2: Any tips on managing ZFS, regarding bringing to my attention any problems
that may arise, or automatic healing it performs, disk degradation, etc. are
very much welcome.

jode · October 18, 2024, 11:25pm

https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/index.html#root-on-zfs

M0rpheus · October 19, 2024, 10:56am

Hey, thanks for taking the time!

I came across root-on-zfs a couple of years ago.
It solves the issue but it seems to add a lot of complexity.
That’s actually why I said:

wertigon · October 19, 2024, 1:28pm

Oh, you definitely could do that, but most NAS software runs from a RAM disk these days. Most small Linux systems are only like 200-300 MB, so having the entire boot system on a 512 MB RAMdisk makes the most sense. I know some people just use a USB key to bootstrap the system and then leave it running.

Second, TLC is good enough for what you want to do, without a doubt. Normal use will last for years and years, and we’re talking $20 drives for 120GB / 256GB drives.

Third, TLC drives deliver x8 the capacity of SLC; but they are the same cells, so any TLC can be theoretically made into an SLC. I do not know if this is part of the drive firmware, however.

jode · October 19, 2024, 1:36pm

So, you’re looking for redundant drives for booting a NAS.
Choice is really dependent on what connectors you want to use and how reliable you want it.

SATA: I’d pick a Crucial MX - e.g. 2x 500GB.
m.2: I’d pick a budget nand drive from a premium brand here, e.g. Samsung EVO
u.2: Optane for ultimate reliability and overkill?

Costs obviously increase exponentially. Maybe I shouldn’t answer as I’m the guy to run boot on small single SATA drive. NAS software gets mostly loaded into RAM and there is very little further access on boot drive.

Now, this is interesting. The drive config needs to be redundant before the OS is active (booted). I experimented with a software raid1 setup, but this doesn’t extend to the (U)EFI partition and keeping these partitions in sync is a pain.

So, I think the most reliable solution for redundant boot drives may be the mobo software raid. - Please note that I don’t advocate for using mobo raid for data arrays.

M0rpheus · October 19, 2024, 3:00pm

Thanks for the input!

I guess it skipped my mind that TLC only comes into effect as you use up all the first layers of of all the cells. So in that regard, speccing for a drive that’s 2x-4x what my OS (Debian) requires could solve that outright!

M0rpheus · October 19, 2024, 3:07pm

I should have clarified that, I’m looking at m.2, I have 2 slots to spare. I’m dedicating all the SATA to storage.
So maybe a pair of Samsung EVO plus, could do the trick, given:

I follow. Although (from my limited understanding) this means giving up checksums & healing. At which point I might as well configure drive-mirroring through LVM during OS installation.

k3ninho · October 19, 2024, 5:15pm

I’m also for non-redundant boot drives. I advocate simplicity when you’re starting out – and ‘waiting for the perfect solution is the enemy of a good fix right now’ is a phrase in my mind atm.

Both array and system drives have a fixed lifetime and are scanned with SMART tests periodically to see how they’re doing, plus the system drives are backed up or can be recreated easily via Debian Unattended Install. I can’t see a benefit from the complexity of redundant boot drives when recreating the OS is this easy. Especially when my arrays also don’t have complicated layouts of pools, mirrors and vdevs because they’re LVM2 and that’s easy to migrate to nearly any Linux host. (I did, however, have some firewall changes not stick around when I turned off an array with 460-day uptime to clean 4 years of dust out of it recently.)

K3n.

M0rpheus · October 20, 2024, 11:34am

Appreciate the input.

Personally, I like to set & forget. The redundancy offers me a buffer when something goes wrong, that would allow me to carry on using the machine, and schedule a fix when I feel like it.

I’m probably entering the paranoid zone, but since I learned a few years ago about all the silent ways disks can error out, checksums & healing has been my only antidote.

With that said, I do agree that the simplicity to set it up is important, that’s why I steered away from root-on-zfs.

But alas, I might be asking for too much here, there might not be a solution that fits my needs.

wertigon · October 20, 2024, 1:20pm

Well, there is one but you are probably not going to like it, we use it at work to provide almost-maintenance-free systems delivered to sea wind farms.

Keep the Linux root system as a file image on the EFI partition, as an immutable distro. There are plenty of distros that support this, but for your use case learning Yocto or NixOS is probably the easiest step forward.
Use checksum fingerprinting to ensure that the file is good, I’d recommend using md5sum, sha1sum and sha256sum or something like that. If all three are good chances your image is corrupt is like one in a trillion or more.
Boot this image onto a ramdisk. This can be configured as part of your Yocto image.
Enjoy.

Unfortunately this is not a good fit for homelabbers since it will not easily let you install and reinstall things (great once you have figured out a setup, horrible to discover a good setup), but yes, this is what we use to keep power lines running to unmanned plants, and is quite a common setup in the embedded world.

TryTwiceMedia · October 20, 2024, 3:23pm

@wertigon just gave you THE set and forget

There’s a hierarchy to this for sure:

mission critical - checksum validated image with rolling updates loaded into RAM
redundant enterprise drives with checksumming
single enterprise drive with checksumming
redundant enterprise drive
single enterprise drive
steps 2-5 from consumer drives

the reality is: how quick do you want this running and what kind of uptime so you need?

Your NAS will need security updates as threats are ever evolving, so schedule updates and disk scrubs. Configure alerts when something goes wrong, and plan on replacing the drives in 3 years as this NAS is deprecated to being your backup NAS.

Nothing will beat the availability of simply having 2 NAS systems in redundancy.

M0rpheus · October 20, 2024, 6:59pm

@wertigon, @TryTwiceMedia

It seems my original plan cannot be realized without complicating the set-up phase substantially. I will take your input and look into the alternative of simplifying and streamlining the setup/disaster-recovery process.

I will put some hours in the coming days to research a bit more on this and see what the best path is for me.

Thank you all <3

M0rpheus · October 27, 2024, 11:25pm

I’ve put spent some time on this, and I settled on 2 things:

Going for a pair of samsung 970 evo plus as boot drives, @3x/4x the capacity I’ll need, to keep wear to a minimum.
Setting them up in a basic raid-1 through the installer.

Though not ideal, it keeps things simple, and covers (for around 50eu. for the additional drive) the bases, should a drive fail.

While searching I came across an old find that I’d forgotten, dm-integrity.
I failed to find any decent guides or digestible information to set it up.
Does anyone have any input on how hard it would be to set it up for the boot drives, and if it’s a good idea?

k3ninho · October 28, 2024, 11:38am

Yeah, there’s definitely a gap for a guide. A quick test at home with crc32 and hmac-sha256 on 1GB file* liked big blocks to read and write but couldn’t exceed 250MB/s write with either crc32 or hmac-sha256 on a 4096-bit key. Reads with the crc32 hashing were too big at 3900 MB/s for 1MB, 4MB, 16M and 64MB blocks, and peaked at 4900 MB/s on a 256KB block size in fio when using hmac-sha256.

I will need to test on raw hardware, not a loop-mounted file, and then on an LVM2 array. There’s also options to keep the integrity map in a bit-array and to use a specific device for the integrity information.

For setting up boot, there’s info in man pages: integritytab and systemd-integritysetup.

*: on ext4 backed by a Gen3 NVMe drive (which may also be boosted by the dentry/LRU cache in Linux) with a Ryzen 5600 and 32GB of DDR4-3200 RAM.

M0rpheus · October 31, 2024, 5:26pm

Damn!

First of all thanks for going through the trouble, much appreciated!

I read it thrice to get what you did (due to my lack of experience), I still have a gap though.
Was the difference in speed in the 2 scenarios you mentioned, due to block size, or use of a 4096-bit key?

I’ll be on a 3800xt with 32gb DDR4-3200 ECC ram, and 2x Samsung 970 evo plus @500gb (nvme), so we might not be far off in terms of performance.

I’ll try too look deeper into the subject, using the man pages you suggested on the weekend.

Again, thank you.

k3ninho · November 1, 2024, 10:44am

This is the hashing speed. (I’m trying to find where the kernel logs its hashing speed calculations. There also used to be a command to compare the speed for your CPU/RAM combination.) For now intergritysetup format ... -I $hashtype prints a rate at which the integrity data for the device was read, calculated and stored.

K3n.

M0rpheus · November 3, 2024, 12:22am

Ah, got it!
Thanks for the clarification.

I did a bit of searching, and I found something potentially useful.
Documented here, are instructions on how one can use LVM and its lvconvert command, to add integrity to an existing RAID volume. This can be done using the --raidintegrity argument.

I also found it documented in Debian’s manpages.

I am a bit confused though because it is called LVMRAID. Is it something different than LVM, or is it a utility provided by it?

If so, could this whole thing be solved by simply following the Debian installer and setting up a RAID-1 for my boot drives using LVM, and then on first boot use the lvconvert --raidintegrity y LV command to add dm-integrity?

k3ninho · November 4, 2024, 12:19pm

Nice find! I read LVMRAID as being RAID provided by LVM – you’re using LVM commands alongside the cryptsetup/intergritysetup mechanisms of the Linux Kernel.

If you add it to a pre-existing array, the notes say you need to not use about 1.25% unused in the PV space to store the integrity hashes.

The man page says it calculates integrity for a Logical Volume on top of the mirror/stripe setup and will use parity data to fix when it sees errors. There’s another way to layer it where the intergrity information is on each Physical Volume before creating the arrays, which would lose ‘heal via parity data’.

Note also that the bitmap is a much faster way to handle integrity than the journal, but it needs safe shutdown (like a UPS) on power loss or it won’t know if the integrity data or the filesystem data is the correct state of affairs.

K3n.

M0rpheus · November 4, 2024, 6:43pm

Thank you!
I’m glad I could contribute somehow to this.

I’ll have to go with journal as I don’t have a UPS. The space shouldn’t be a problem, I’ll be going with 500gb drives, which are at least an order of magnitude over what a Debian 12 Install requires, and I won’t be installing much else.

I’ll go ahead and place an order for the drives then. In the off-chance I get some free time until delivery I will try to set up a VM to test this, otherwise test in “production”

Any guidance on --raidintegrityblocksize? I see it is suggested to go with what the drive “likes”. Should I just check it somehow and go along or do you think it would automatically detect it if I don’t specify a value?

k3ninho · November 5, 2024, 9:42am

4K would be a minimum, how may 4K blocks do you want to bundle together, into 256K or 1M chunks for aggregated integrity reporting vs time to pinpoint which data is out of place?

There’s advice online for swapping LBA-reported size to 4K but my domestic Samsungs don’t show that they can format an NVME namespace to anything other than 512-byte blocks. YMMV, it’d need benchmarking to see whether there’s a real gain inside the device for using 4K blocks.

K3n.