ZFS / TrueNAS Best Practices?

pooter · December 6, 2022, 1:31am

Howdy yall,

So I have been learning WAY more than I thought I would about ZFS coming from Unraid to TrueNAS Scale. My favorite… is thinking I learned something based on an opinionated ape on reddit then figuring out their confidence was ill-founded and I actually ‘learned’ the incorrect information. Egos seem to run rampant in the ZFS community so I decided to turn to this forum in hopes of getting the correct information for several questions I have about setting up TrueNAS.

I have two instances of Scale running (one for VMs & and one as a NAS). I would love to see (and am even willing to write) a guide on the nuances of ZFS for common NAS applications when coming from a more mainstream file system however I would prefer to know I have the right information first.

Media Stack

As I’m sure many new homelabbers do, I want to set up a media library using the *arr stack of applications. Morality aside, I would like to have a definitive answer for what is the best setup for datasets/zvols/etc. for something like this.

What is the ‘best’ dataset layout for the actual storage of movies, tv-shows, music… And what types of advanced settings (block-size, de-dup, etc) inside of ZFS will result in the best Drive longevity & performance?
What is the best dataset layout for the processing & acquisition of media files ( Specifically how to setup a dataset optimally for downloading both NZB & Torrent files). This is a loaded question and I would really appreciate someone explaining if my half-baked theories hold any water (or kindly explain why they don’t):

2a. You should use separate datasets for torrents & NZB downloads because of how differently they are handled and your settings for each dataset should therefore differ
2b. Perhaps a zpool with SSDs would make the most sense for torrents (not because of cache functionality, I realize using an SSD as cache with something like Unraid is what ZFS does with RAM), but instead because they typically more RW endurance and may not suffer from the fragmentation issues presented by torrent downloads
2c. Can someone please settle the debate on whether Pre-Allocation within my torrent client is good or bad? I have seen conflicting information with some saying its useless, others saying it’s worse, and even others saying it solves the problem entirely.
2d. Hardlinks/Atomic Moves: When using Unraid, I had a media setup that used hardlinks as detailed in trash-guides. I am pretty sure the ZFS equivalent is Deduplication which I’ve been warned not to do on a system with relatively limited RAM.

While I would love to do everything within TrueNAS, in my head I keep thinking the redundancy inside of Unraid is sufficient for non-crucial data like a media library and offers the benefit of being expandable one drive at a time. I would then virtualize TrueNAS inside of Unraid and then use that for more sensitive data such as backups, family photos, etc. However, I am hoping someone in the community can help explain that ZFS is plenty capable of doing what I want in an efficient and non-hardware-degrading way

Finally, in reference to drive selection: I have 12x 6TB HDDs & 6x 18TB HDDs. With the use-cases consisting of either a Media Library (non-crucial data) OR backup data (crucial data), is there any reason why more smaller drives would be better for one or the other? Or if fewer, larger drives would be better for one or the other?

I apologize for the sporadic organization of this post, I have lots of questions and am eager to have a better understanding of ZFS / TrueNAS as a whole.

Zedicus · December 6, 2022, 2:46am

For more performance use mirror pools and lots of vdevs. If using RAIDZ for something, RAIDZ2 is the minimum raidz to use and it would be good for just mass stotage. Each raidz vdev will have iops of one drive but can write a bit faster. Mirrored vdevs gain performance in all areas at the expense of the size of a drive.

You could use a few types of drives and create different pools for different tasks.

I have always preallocated torrents. Especially on ZFS. My belief is it helps with minimizing initial fragmentation. This is all debated though.

Theres no need for unraid, put it out to pasture.

pooter · December 6, 2022, 2:56am

What is the reason for this?

Do you recommend having a torrent dataset with Block Size = 16kb?

Zedicus · December 6, 2022, 3:11am

Large drives in RAIDZ take a long time to rebuild, increasing odds of another drive failing. Also RAIDZ cannot double verify pool integrity. That can allow bitrot. RAIDZ2 and up solve this.

I dont even build out a separate pool just for the dowloads folder. It is just on the pool with all the media. The performance of drives and servers now does not necessitate that much extra design anymore.

jode · December 6, 2022, 2:34pm

You hint at a very diverse set of storage requirements that benefit from tuning and proper storage selection.

You will find a lot of passionate zfs fans because zfs allows very detailed tuning to different workloads, often even within a single storage pool.

Let me start to translate your use cases into proper technical requirements for review and discussion. Then I’ll propose solutions again for discussion.

Use case 1: media storage on home server
Characteristics:

Low access concurrency, if any
WORM (write once, read many) access pattern
Mostly, even exclusively large files (GB+), data access pattern mostly sequential (copying, reading)
Media files are typically not effectively compressible (media can include video, audio, image formats)
No indication that media files will share identical blocks, so no candidate for deduplication
Relatively large storage pool (tens of TB)
Archival in nature - redundancy critical to prevent data loss

Use case 2: torrent storage
Let’s assume this is used for legal and ethical purposes, meaning downloading and seeding of torrents. I assume that this use case involves mostly compressed data files (media or compressed containers).
Characteristics:

High access concurrency
WORM (write once, read many) access pattern
Mostly, even exclusively large files (GB+) , data access pattern highly random both for writing and reading
Media files are typically not effectively compressible
No indication that media files will share identical blocks, so no candidate for deduplication
Storage longevity: relatively temporary. Download, seed for a limited amount of time, then replace with new torrent. Requirements for data redundancy are probably lower than in first use case.
Storage pool size probably < 1TB, maybe single digit TBs

Implementation proposals

Use case #1

Use relatively cheap storage medium. Existing hdds are great.
Performance requirements are relatively low due to lack of expected concurrency; watching, listening to media does not require lots of bandwidth.
To minimize cost I’d recommend a RAIDZ configuration (as opposed to pool of mirrors). The exact configuration depends on your risk level. I’d make pools of identical drives (18TB into a separate pool from 6TB drives).
The general recommendation is not to have too many drives per RAIDZ (I think 6-12). With your existing set of drives configuring three RAIDZ of 6 drives makes sense to me.
Personally, I did not have any issues with RAIDZ1 configurations for media storage (again to minimize cost). There are significantly differing opinions out there, including in this thread. Consider higher RAIDZ levels depending on your risk profile.
I understand DRAID to be mostly beneficial for pools with significantly larger drive numbers in enterprise situations. I don’t have hands-on experience with DRAID due to that.
HDDs accel at sequential access patterns. To keep them accessing mostly large continuous sections of data consider adding a “special device” configuration for metadata and the small amount of small block sizes. SSDs, especially Optane SSDs are great for that and currently relatively cheaply available. Search this forum for discussion of zfs special devices.
Configure storage pool without deduplication and compression
Follow general best practices when creating pool (4k alignment, noatime, nodiratime, etc.)

Use case 2:

Relatively high performance requirements with random access patterns of small sector sizes indicate SSDs as perferred storage medium.
Relatively small amount of required storage makes SSDs affordable.
Number of SSDs depends on exact performance and storage pool size requirements. 1 SSD may be sufficient.
I’d consider this pool to be completely temporary and therefore expendable. I’d configure multiple SSDs in a RAID0 configuration to maximize performance. Consider mirrored or RAIDZ configurations based on your risk profile.
Assuming lots of writes watch for wear on SSDs. Consider enterprise class NAND or Optane drives. Gen 1 Optane drives are on sale at the moment.
Don’t forget to turn on auto trim when setting up storage pool.

Zedicus · December 6, 2022, 2:45pm

thats perfect info for the storage. sadly though, the storage performance is 1/4 of the picture. 10 yr old storage hardware can max out a 1gb network connection. and a fire stick connected to a AP over 2 bars will never stream 4k content. heck even the 4k fire stick on wifi 5 struggles with a lot of 4k content, at no fault of the NAS.

do NOT over engineer a NAS and then be sad when the PC connecting to it has crap performance, due to the pc connecting to it being crap.

jode · December 6, 2022, 2:49pm

This.
The proposed storage setup should come close to saturating a 10G network (especially when using NVMe SSDs). The cost is high enough that investment in sufficient/matching network infrastructure should be considered.

Log · December 6, 2022, 9:17pm

No. The contents of torrents themselves are split into blocks that are to be read or written out as, which end up generally being a considerable size: Torrent Piece Size - VuzeWiki

File size	350MB	350MB	350MB	700MB	1400MB
Piece size	64kB	256kB	512kB	64kB	1MB
Number of pieces	5600	1400	700	11200	1400

If you are worried about fragmentation, just move the completed torrent to a “completed” dataset. However, just like other COW filesystems, there is absolutely no guarantee that a file will be written out sequentially, it’s only “best effort”. Once downloaded, it’s basically all reads from then on (assuming ATIME is off), which will largely be cached in ARC if they are called up enough to matter. Serious torrent seeding is actually a decent use case for having an L2ARC.

For torrents to a ZFS pool and other COW filesystems, preallocation should be turned off. At best zeros are written, which are then compressed by ZFS. If you turned off compression (don’t do this), and actually wrote out zeros, it still wouldn’t help because every piece of torrent data you receive and write would not touch the preallocated space at all, they would get written out as new blocks. The associated block of zeros would then either be deallocated or kept if a snapshot was made.

Zedicus · December 6, 2022, 9:46pm

it actually matters significantly HOW the client requests preallocation. but you would have to sift through source code as it is usually not a feature that is detailed well in documentation. the end result is ‘just dont’.

Exard3k · December 6, 2022, 11:58pm

Just keep specialized datasets around. Adapting compression and recordsize as well as snapshot behavior will solve those problems. 2^48 datasets is the limit, use them.

Nothing worse than frequently changing data that is fragmenting your pool on top of frequent snapshots.

I have a dataset with various ZVOLs for VM disks that keep fragmenting quite considerably. I also found a couple of hundreds snapshots from those disks. Now this was a mess. I tweaked some stuff and with cloning and “sane” snapshotting, I got this under control. Having a dedicated folder with corresponding dataset properties makes this much easier in your case.

And as @log said, don’t turn off compression.
LZ4 is free real estate (and ZSTD if your CPU can handle it, I use it for most stuff).

pooter · December 9, 2022, 10:27pm

Wow, came back a few days later and yall answered the hell out of my question lol.

This. This is what I WISH was easier to find. Thank you very much for sharing your wisdom and doing so in a digestible fashion. Having a collection of these sorts of “translations” from high-level use-cases to the technical terminology & requirements that come along with it.

Do you have a preferred methodology for triggering a move within the TrueNAS UI? Or should it be configured in the torrenting application? Also does it matter what kind of pool this would be on? I have two optane 500GB drives coming in a few days and my intuition would be that these might be a good ‘ingestion’ dataset for things like downloads because of their durability

Log · December 10, 2022, 3:02am

You’d create a dataset for completed torrents, and then use the torrent programs built in “move on completion” setting. Because it’s being moved across datasets, the file gets completely rewritten rather than just quickly changing the links like what happens when things are moved within a dataset or other filesystem. I am not familiar with truenas or it’s UI.

For torrents ingestion pools, I’d just use any old drives I have laying around and put them in a mirror. HDDs or sata SSD’s are perfectly fine, torrents aren’t very write heavy, the data is basically write once and done (there’s write amplification involved, but whatever same thing). Honestly HDDs are a good fit for torrents.

Optane is expensive for its size, and is wasted on torrents. For me drives of that size would be for virtual machines.