Storage upgrade on main workstation / home server

bootch · February 26, 2020, 9:54pm

Hi guys,

I got my main workstation fitted with 5 spinning rust drives.
Thing is the fitting happned more than 10 years ago.
For a while now, I live in fear of losing my data, so I decided to do something about it.

Old drives:

2x 500 GB Samsung Spinpoint F1
1 TB WD Green
1 TB Samsung Spinpoint
320 GB WD ancient

Bough new hardware with ZFS in mind:

4x 8TB Iron Wolf Pro
1 TB XPG Gammix S11 Pro NVMe (for OS: Windows 10 and Ubuntu)

My rig is:

i5 4670K
Asrock Z97 Extreme 4
16 GB DDR3 1600MHz CL11

What I want to achieve is:

Data safety (family fotos/videos, documents)
Serve multimedia with Plex, music with Subsonic-like software and host OwnCloud

I need your advice on ZFS vdev configuration: Raidz2 vs 2x mirrored vdevs (RAID10 effectively)

Do I need SLOG / L2ARC?
I can buy another NVMe drive with Akasa PCI-e to M.2 adapter, but will my mobo handle that? Most NVMe drives requires 4 PCIe lanes, right?

Data access patterns / use cases:

Photos / videos - will be copied there for archiving and accessed rarely.
Plex server will be used daily by 1 local client, very rare 3 remote clients (some relatives)
Music will be accessed remotely by 1 phone (Spotify replacement)
OwnCloud: regular backups from 2 phones, online docs editing up to 2 simultanous users

To boost creativity I would like to also start a contest for best word describing data loss fobia. May the odds be ever in your favour

Regards,
bootch

thro · February 27, 2020, 12:25am

RAIDZ vs. multiple mirrors is a trade off

mirror VDEVS are faster
adding mirror VDEVs you can do with 2 drives at a time
RAIDZ VDEVs cost you less capacity once you go beyond 4 drives (but going beyond 4 drives means a bigger enclosure, etc. which adds cost and physical size).
RAIDZ VDEVs can be slightly more resilient in that it doesn’t matter WHICH drives fail. e.g. if you have multiple 2 drive mirrors, you’re screwed if 2 drives fail in the same 2 drive mirror. WIth a RAIDZ2, you can have ANY 2 drives fail and still be OK. But you pay the penalty in terms of performance.

For a ZFS pool, your IOPs are dependent on the number of VDEVs you have. With mirrors, you can have more VDEVs with less drives, which means more (random IO) performance for the same number of drives.

That said…

For a single/small number of users? I’ll say “no”. I don’t have l2arc or slog on my ZFS nas and i’ve been happy with the 4 drive dual mirrors it has for the past 8 years, performance wise. It’s normally limited (or close to limited) by the gig ethernet connection it has.

All of your listed access patterns are fairly sequential/data archival type with a low number of users. Same sort of stuff I do on my home 2x mirror setup.

Also… i have a 6 drive RAIDZ2 at work for my personal workstation’s NFS based KVM data store. Performance running VMs on that is adequate for lab use over gig ethernet. Much more intense workload than you’re talking about… no l2arc or ZIL there either. That’s 6x SAS in RAID6 with 16 GB of RAM running FreeNAS with no tuning - over 1 gig ethernet to my workstation.

I would suggest that any SSD based cache beyond the memory cache is probably not worth it in your case and will just add additional complexity. If you had tens or hundreds of users (or much more intense data access workloads) it would be different. But you don’t.

What I’d do… use the ZFS pool with rust only, rely on RAM cache, and use the SSD as the boot/main drive for your system. Get a terabyte or two of SSD and use that for OS, games, short term current projects storage etc. Stuff you can afford to lose.

Back the “current projects” up to the ZFS pool daily/weekly/whatever.

In your situation i would (and have) get 4 drives (whatever the current price/capacity sweet spot is - which you did), set them up as dual mirror VDEVs, and one or more 1-2 TB SSDs for your OS/current data drive in your main box.

If you are concerned about the “2 drives failing in the same mirror” concern… then go RAIDZ2 or RAIDZ3, but to make it “worth it” IMHO you need to go to 6+ drives which as above means more initial outlay, more power, larger enclosure/more drive bays required, etc.

In that line of thinking my “real important” stuff is backed up to more places than my NAS. If your super important stuff is in nextcloud (for example) and kept on device, it will survive a NAS failure as it will be synced to multiple locations.

So for me, i consider 2x mirrors to be “good enough” for my purposes.

YMMV.

“Sanity”

Thinking about this stuff before it happens is important. The other people who don’t are the crazy ones

reavessm · February 27, 2020, 1:23pm

Never add L2ARC unless you’ve maxed out your RAM, as L2ARC is indexed in RAM and could potentially harm your regular ARC performance.

SLOG is only useful for synchronous writes, like NFS, but definitely is not necessary. You can try it out without, and if it performs poorly, you can add it later.

Mirrored vdevs tend to have better random IOPS, while RaidZ(2,3) tend to be better for sequential operations, ESPECIALLY for larger files. You also need to think about upgrade paths. Starting out with a Mirrored VDEV config means you can add just 2 drives at a time, while a RAIDZ will mean you need to add 4 drives at a time. At least until you can increase the number of disks in a VDEV, which I don’t think you can.

I have RAIDZ2 on my media server, with SLOG and I love it. It’s totally overkill for what I do. If it was just a desktop or maybe a DB server, I would probably go mirrored vdevs though.

Trooper_ish · February 27, 2020, 3:25pm

Also an L2Arc has to start fresh every reboot, for now, until persistent (watch this space!)

bootch · March 6, 2020, 5:16pm

Thank you @thro, @reavessm, @Trooper_ish for your replies!

I’m willing to sacrifice storage space and IOPS for a thinest bit of increased reliability, so I’l go with raidz2.
When it comes to SLOG. I don’t plan to saturate 1 Gbit network, just wanted to avoid penatly of writting my data twice to spinnig rust. I read some article that stated ZIL will be craeted on your zpool if dedicated device won’t be provided.
Would like to use my NVMe drive for SLOG. This drive will also be used to install Windows 10 and Ubuntu (dual boot).
Can SLOG be placed on partition? If so, does it have to be primary one?
Any advice on partitioning, aligining (does gparted align partitions automatically?), preping my NVMe drive are most welcome. It’s my first PCIe based storage and second flash drive I own - not much experience in the area.

@thro mentioned that I should rely on RAM.
I have tiny user base and only 16 GB of memory. WIll it be enough to use this PC as a main workstation?
Also does it make sense to turn on compression on zpool that will hold: photos, videos, music and tiny bit of ducuments? How much additional memory will be required to turn compresion on?

I’m using Fedora for my daily work. I’m quite happy with it and I like fact that packages are up to date. I use it for Java development and devops.
For my ZFS pool I planned to go with Ubuntu 19.10 as it’s only distro with official support for ZFS. I was afraid that Fedora’s frequent changes might introduce a bug, which can cost me my data. Does it make sense?

Next imporant subject was brought up by @reavessm - upgrade paths.
I can afford buying another 4 drive raidz2 vdev if I run out of storage.
Both financially and physically can fit 4 more drives in my box.
I got big tower Chieftec with additional backplane occupying 3x5.25" bays.
My first vdev lands in backplane.
Additional questions that come to mind are:

can I migrate zpool to new mobo/controller? (in case of moving my storage to separate machine)
can zpool span across more than one controller (chipset + some extra conroller soldered on mobo)
I read that SATA controllers integrated in mobo’s are not relible and I should go with HBA. How does are not reliable? I just miss experience in that area.

Cheers!
bootch

reavessm · March 6, 2020, 7:46pm

I think SLOG can be on another partition, and it only needs about 4-8 GB, as ZIL/SLOG only holds Synchronous writes not yet committed to the main zpool. In theory, the faster the main zpool, and the less you write at a time, the less ZIL/SLOG needed. So basically, if you can partition it, then it should be fine, but if not, I wouldn’t waste a whole NVMe on it. You could buy a SATA DOM that’s about 16 GB that could fit your needs a little better.

ZFS gets kind of a bad rap for RAM. It has limits, but ZFS will just about use as much RAM as you have for cache (80%?), so the more the better. However, you should be fine with less, things might just be a little slower. It should also be noted that ARC is a mix of MFU and MRU cache, so some data workloads might not hit ARC at all. And of course, the ARC is pretty highly evictable, meaning if you need to do something else that needs more RAM, it will just kick ARC out.

Compression is basically free. LZ4 is the default and it has an early abort feature that basically doesn’t even try to compress things that aren’t compressible. However, if you know a dataset will only have movies and photos on it, you could turn off compression just for that dataset for a marginal gain.

Finally, migrating ZFS disks between devices is by far the simplest thing ever. All the meta data is stored on the disks, so it’s just plug and play. That’s why RAID cards aren’t as good as HBAs. with an HBA, each drive is visible to the OS, while with a RAID card, all 4 drives will be seen as one, and could screw up metadata things. Definitly go on board SATA or HBA. The only issue really with migrating ZPOOLS is making sure the ZFS versions match up, as adding features can make it not importable to a system that doesn’t have those features. Especially going between FreeNas and ZoL.

nx2l · March 6, 2020, 8:07pm

From what Ive read in an xisystems article…

zil is located in the the pool.
So if something requests sync writes, it writes it there before then writing it to the pool.

So if you have a dedicated slog, that prevents the double writes to the pool (once to zil , then again into the pool)

reavessm · March 6, 2020, 8:08pm

Yes, it still double writes. But whether it holds on to 4 GB vs 8 GB would depend on how much data is in flight. After it flushes to the pool, the ZIL is cleared.

And all that only matters for sync writes. If you really want speed at the expense of reliability, you can set a tunable (sync=never? I can’t remember which one) and it treats everything as async, using no ZIL and not incurring a double write penalty.

reavessm · March 6, 2020, 8:12pm

This guy will answer all of your questions:

bootch · March 6, 2020, 9:34pm

<3 Lawrence

thro · March 8, 2020, 3:13am

Depends how much RAM you need for other things vs. how much RAM you leave for ZFS.

Given your userbase of a handful of users, and non/minimally-cacheable use case, i’d say a couple of gigs tops for ZFS use would be plenty. You may want to tune its max RAM usage and experiment.

Compression is almost free (so even if it is non-compressible data, its “good enough” to just leave it on and forget what you store on what data-set) due to modern CPUs running the algorithm so fast and spinning rust being comparatively slow.

You can experiment with on vs. off but any CPU of the past 10 years is fast enough to make leaving it on almost free.

One pretty big handful of salt you need to take when reading ZFS documentation on cache, etc. is that it was originally designed for enterprise storage arrays, and that’s the context a lot of the documentation was written for. These enterprise arrays are normally serving hundreds or thousands of users - not a handful in a home environment.

The less users you have, in general the less effective cache is. And archival type use (which is more write heavy than repeated read of same data) doesn’t get much benefit from cache either.

That’s why i’m suggesting to just use RAM and not bother with a ZIL or L2ARC on SSD. I just think that the benefit vs. additional complexity and losing an SSD you could do something more productive with (like run more of your “current” data/games/whatever or OS off it at SSD speed) wouldn’t be worth it.

If you have the time, experiment with a couple of configs before committing to it and see for yourself whether or not the benefits are there for what you’re doing with it.

However for my use case (basically archival / media dump NAS - sound similar to me, to what you’d be doing) i got away with an ancient AMD APU with 2GB of RAM running FreeNAS (yeah i didn’t meet their recommended spec!) and 2x 2 drive mirrors for a while. I added another 8 GB (so it has 10 GB in total) and have been happy enough with the performance for what it does ever since. The extra RAM made bugger all difference to performance that i can remember, however i could then run plugins on it for transmission, plex, etc.).

I’m due to replace the thing, but not out of performance requirements, just purely because the hardware is ancient and i don’t want to rely on it

system · December 6, 2020, 9:13pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.