Setting up ZFS for all things data in the house (media files and office files)

efficacious · February 28, 2023, 10:35pm

Hi there

I am planning to migrate all of my data to file server running Truenas Core. I have dabbled a bit with ZFS but I am a little hazy on some of the details. Hopefully, someone here can help me.

Basically, I want all my data to live on the NAS and all the clients connect to it and store their files (minus OS installation) on the NAS. That includes my game library, plex library, photos and videos, office documents, etc.

I have two people accessing the files on two desktop computer (10Gig LAN), 2 Laptops and a bunch of smaller devices like Nvidia Shield etc. And another server running Plex and Docker container.

My Server Hardware: Intel 2680v4, 64GB RAM (might upgrade to 192GB), 10Gbit NIC

Drives for Storage:

8x HGST 10TB SAS as Raid-Z2
2x Optane 1600x 118GB mirrored
2x Samsung PM983a 1.8TB NVME mirrored
1x Intel P4600 1.6TB
3x 2TB SSDs Sata as Raid-Z1

The Setup that I had in mind:

Pool 1 - Games Library
- 3x 2TB SSDs Sata in Raid-Z1
- 30GB SLOG (Optane Mirror)

The Clients (mostly WIndows Gaming machine) would access it via iSCSI. Is the missing Metadata device a potential bottleneck for all the small file reads?

Pool 2 - Everything else
- 8x 10TB as Raid-Z2
- 30GB SLOG (Optane Mirror)
- 2x 1.8TB Samsung Mirror
- 1x Intel P4600 as Special Metadata Device or L2ARC

What I would like to do is store the smaller files (<10MB) on the NVME MIrror and everything else on spinning rust. In order to speed things up when searching and accessing files I thought a special Metadata device might be helpful but I don’t have it mirrored so I would rather go with a solution where I don’t loose data. Another option would to use the Samsung NVMEs as the special Metadata device and I know that Wendell mentioned that I can still store small files on that device.

The overall goal is to be able to use the NAS as if it were local storage but with ZFS goodness like deduplication, snapshots etc.

Do my thoughts and explanations make sense? What should I change?

Here is a snapshot of my file size distribution on my current pool:

1k: 2681662
  2k: 245897
  4k: 203035
  8k: 149583
 16k: 144933
 32k:  77459
 64k:  56017
128k:  39822
256k:  30737
512k:  20041
  1M:  16131
  2M:  10818
  4M:   6410
  8M:   7766
 16M:   4123
 32M:   1207
 64M:    628
128M:    518
256M:    208
512M:    384
  1G:    505
  2G:    296
  4G:    287
  8G:    101
 16G:     61
 32G:     15
 64G:      2

jode · February 28, 2023, 11:30pm

I would use the Samsung Mirror as Special devices:

Pool 2:

RAIDZ2 8x 10TB SAS
Log Mirror 2x 30GB Optane
Special Mirror 2x Samsung NVMe

zfs set atime=off “Pool 2”
zfs set relatime=on “Pool 2”
zfs set compression=on “Pool 2”
zfs set special_small_blocks=64k “Pool 2” # needs to be smaller than recordsize, 64k is good
zfs create “Pool 2”/media
zfs set recordsize=1m “Pool 2”/media
zfs set compression=off “Pool 2”/media
zfs create “Pool 2”/documents
zfs create “Pool 2”/games
…

The special devices will store all small files and serve them with the speed of the NVMe drives. 8x SAS drives should saturate the 10gbit network for files 128k and larger.
Make sure to create different datasets for the file types you store. Tune datasets for their use case. E.g. Media files pictures/videos don’t compress well and are generally larger than 1MB in size.

efficacious · March 1, 2023, 12:04am

thank you, that’s exactly what I needed!

Would I benefit from setting it like this?
sudo zfs set logbias=throughput “Pool 2”/anyDatasetWSmallFiles
sudo zfs set logbias=latency “Pool 2”/media

jode · March 1, 2023, 1:15am

From man zfsprops: If logbias is set to throughput, ZFS will not use configured pool log devices.

My hunch is that this setting may be useful in case you don’t have two optane devices configured as log devices.

Monitor your pool under your specific load situations using
zpool iostat -r <secs>,
zpool iostat -r <secs>,
zpool iostat -v <secs>, and
iostat -zyxm 5.
This will allow you to look for bottlenecks. Mind you - the bottlenecks may be located outside of your file server.

I would only apply logbias=throughput in case the Optanes proof to be a bottleneck - AND - the server is actually faster with this setting. I somehow don’t expect this.

MikeGrok · March 1, 2023, 3:16am

BTW, I worked at IX systems (primary developers of truenas) for a few weeks in their tech support department then automated myself out of a job. For the job, partly I wanted to help out and I liked zfs, partially I saw them doing some dumb things business wise, and wanted them to stop being stupid so they could contribute to the community better in the decades to come. In the second week I wrote a log analysis shell script that output an html file, that provided better feedback than a senior tech could produce in 5 hours. This freed up about 30 hours a day of technician time, and I was the new hire, so I was let go. I also helped with some driver diagnostic tools that saved them about $3000 a day in shipping. I started a 6 month contract for double that hourly rate 2 weeks later. They released software 4 months later that they were expecting to release in 3 years I suspect that this was partially due to the increased time their developers had to develop software instead of do tech support tasks because of some of the tools I left them.

From what I understand, the slog for write events is in case of a power loss or motherboard failure while files have not been written. The only time data gets read from it is during an unexpected shutdown. The data that got committed to the slog remains in ram, and then gets written to the primary array. ZFS can detect if the slog is performing properly, there is no need to mirror it. If it is performing properly, it gets used, if it isn’t then it does not get used. If you assign it to a mirrored SSD, both of them will probably fail at the same time, and you will receive no benefit of having two.

Allocating more ram to the array controller is more beneficial than the fastest l2arc you can get. Also, don’t mirror the L2arc, if needed, have a hot spare, but if you mirror it, they will probably both die at the same time.

risk · March 3, 2023, 10:12pm

30G is probably too large for a slog for that kind of typical home (use 2G should probably be enough).

Large writes/user data would bypass the slog anyway (it’s not a dm-writecache).

Use rest of the optane space for your metadata special class and small files <=1M (maybe <=512K … my brain’s too slow to turn your file sizes into a CDF without e.g. a spreadsheet, aim for 50%-75% fullness)

Your Samsung nvme… put into a second computer … or try a big L2ARC.