Thoughts on new Jellyfin NAS

I am looking to drastically downsize the actual size of the NAS server while improving performance and transcoding.

My current build is as such:

1x Phantek Enthoo Pro 2
1x LSI HBA 2x 8087 to SATA
8x WD RED 4TB drives
2x Samsung 870 Pro 500GB/512GB SATA SSDs (L2Arc/ZIL)
1x Crucial 500GB SSD (OS)
1x ADATA 512GB SSD (Docker containers)
1x WD Black NVMe SSD (transcode cache)
1x AMD Ryzen 5600X
2x 16GB DDR4 ECC UDIMM DDR4 3200
1x ASRock Rack X470 mATX motherboard
1x Seasonic Gold 650W Fully Modular PSU
A bunch of Arctic P12 fans
An Noctua Air Cooler

This thing is pretty big and I am moving to a much smaller apartment. (cost of living in my city just went up). I have a second rig (My desktop) that is also using the same case (Phanteks Enthoo Pro 2) so I would prefer to keep the server case small. I would have wanted/preferred hotswap bays in the font and I did see a couple iStarUSA cases with 7x 5.25 bays but they were ITX/mITX only and had only one 120mm fan grill.

This is the build I am thinking of:


Along with a pair of Intel 118GB Optane drives to replace the 870 Pro SSDs for L2Arc and ZIL. (much better IOPS…

• High, consistent IOPs. Intel® Optane™ SSDs deliver high, consistent IOPs for
ZFS intent log (ZIL), without losing speed over time as traditional NAND does.
• Low latency. Intel Optane SSDs maintain low, consistent latency for ZIL while
traditional NAND devices increase latency over time.
• High quality of service (QoS). Low latency means more write operations can
be handled in a lower and narrower latency time period, helping improve
overall QoS.
• Cost-effective performance. Used as L2ARC, Intel Optane SSDs combined
with RAM devices deliver up to 75 percent of the performance of all-RAM
configurations for less than half the price per gigabyte.2
• High endurance. Intel® Optane™ SSD DC P4800X delivers up to 60 drive
writes per day (DWPD), approximately 20x more than typical NAND SSDs
used for caching.

What are y’all thoughts on this build? The new motherboard has 3x m.2 NVMe slots. 2 of which would be occupied by the optane leaving the 3rd for the OS. I have some spare NVMe drives I can use for the OS.
That would leave 2 SATA drives for the Containers and Transcoding.
Sure, I would lose the ECC (nice for ZFS) but as it is, ECC working is “sometimes”. It takes a bit of tinkering to make it work afterall.

The UHD 770 in the 13th gen i5 should be able to handle most transcoding jobs just fine. I do have a Quadro T400 but I kept getting codec errors when using it for transcoding so I stopped.

1 Like

I see a side grade costing 1000$+, not decreasing the the size by much. I also can’t see a performance increase in practical terms. I see a decrease in things like ECC or L2ARC however. From what I can see, it’s a wash both in size, power and performance. But you get that plastic pen for free, so that’s a plus I guess.

I’d buy a DDR4 board instead and use your existing memory. And use SATA for OS and NVMe for actual data. Do you need a SLOG? Sync writes being a thing on Jellyfin? Do you need a mirror or free up an NVMe slot and save money?
As you didn’t mention network link speeds, I assume 1G. This is the main bottleneck, not drive,CPU, memory or ZFS performance.

2 Likes

I wouldn’t call this a side grade:

How exactly is it a wash going from a Full Tower case to a Mid Tower box?

Here are the two case sizes in comparison:

And why would it cost $1000+? Jellyfin isnt the only thing running on the server. So, yes I need a L2ARC and ZIL.
I use a flashed Dell RAID card for the HBA.
I ran zpool iostat and it wasnt great.
I also get buffering and chugging on LAN streaming. All over gigabit. A higher passmark score CPU will help not only transcoding via x264 but also an igpu transcoding with QSV. The 5600X does not have an igpu.
The i5-13500 is a good 10k points higher than the 5600X. While staying at 65w TDP. What about that says “side grade”?

You also want to use something like Optane for L2ARC and ZIL instead of a standard SSD or even an NVMe SSD. Which is what I listed I was doing. 2x 118GB m.2 Optane drives. Optane holds up better for continuous reads/writes. Much longer endurance.

Please tell me more about how the size isn’t being decreased by much or the lack of a practical performance increase.

What I see is higher performance at the same wattage use. I may even upgrade the HDD size in the future. Now THAT would be north of $1000 in cost.

I could put my ECC ram in a non-ECC mobo but the ECC would not function. So… yeah.

You describe your use case as NAS+Jellyfin+transcoding. NAS and Jellyfin (serving media to home users) use cases require lots of IO capacity and little CPU horsepower.

I’m not quite sure how the transcoding is going to be setup. I personally store original data in Jellyfin an let Jellyfin transcode it (down) in case the client doesn’t support the codec. Jellyfin can use hardware codecs, the efficient ones for such a build being the built-in one from Intel (6th gen+ QSV) and AMD iGPUs (AM4 “G” models and all AM5 models).

This seems to be where you are going.

There are other motives for and ways of setting up transcoding. However, all the CPU power you plan on adding would not be used, one could say wasted, assuming iGPU based transcoding would be sufficient. However, there is plenty of compute capacity to go around for other, not mentioned, use cases.

The amount of proposed hw (8HDDs + multiple SSDs) should saturate a 10gbit network (it does in my environment), so the lack of faster networking looks like a miss in my book. It may not matter in your environment.

Currently the price/performance sweet spot for HDDs is at 8+TB drives. I assume you’re using existing drives. Otherwise, a small amount of larger capacity drives would allow for capacity expansion in the future.

You’re goal was to move your existing NAS into a smaller form factor. You certainly achieved that goal - but I would consider ~$800 too big a cost given that you reused hw (HDDs) extensively.

I feel that you could achieve more (meaning an even smaller form factor) with the proposed capital expenditure. Take some inspiration from the following video:

I’d take a look at the case form factor + ITX mobo (not necessary these products, but they are well researched choices) as an option for reducing size even further.

Again, there is little information about the use of the NAS other than storage of Jellyfin media. I’d try to get more use out of the valuable m.2 slots. I’d consider using these as ZFS special devices, ideally with higher capacity m.2 drives. Observing my Jellyfin install, I see very little sync writes (no need for ZIL) and I don’t see a lot of use of ARC caching due to the large media file sizes. OTOH I keep my HDDs operating in their sweet spot by diverting all read/writes of small recordsizes (< 128k not just metadata) onto special devices (concurrent access on HDDs is not really a thing for me).
Again, I assume the Optane 1600Xs are existing devices. They are probably capacity limited as storage for small record size data. In this case I’d consider these for use as special devices for metadata only.

IMHO M.2 slots are so valuable for a NAS build that I only buy mobos with bifurcation support. This allows spitting one or more PCIe slots into multiple 4x m.2 slots (with the help of relatively cheap add-in cards). Intel mobos typically don’t support this (I have not checked your proposal), but that may be worthwhile to double check.

It feels like you have other use cases in the back of your head that you didn’t mention in your posts. But based on what I know, the proposed upgrade/build will work, but I’d give it a “meh” rating for the price. It feels like there are too many opportunities missed.

2 Likes

Thanks for the insight. I do run about 30 docker containers on it. Moslty having to do with media aggregation and organization…a bunch of *arr apps but also some specialized docker networking tools since i am stuck behind a CGNAT and accessing my media from outside the network is a pain without them. I also have a network TV tuner (silicon dust) so I can record TV shows and strip commercials and re-encode them.

@wendell had helped me a couple years ago (before 12th and 13th gen) pick out the 5600X and the ASRock Rack X470D4U board I currently use. He even helped me figure out how to get ECC working (mostly…very finicky) on my board.
The problem I am running into at this point (other than size of the computer) is using the software encoder, transcoding has issues keeping up and if it is just direct play, the array seems to have issues keeping up as well. I currently have 8x 4TB WD Reds. I am considering 8x 8TB Seagate Ironwolf drives in the future since they are currently at ~$128/drive.

I also do not have the Optane drives yet. My current pool looks like the following:

with the Zpool Iostat results as such:


I also have a separate SSD for the OS, one for containers and one for transcoding cache (the containers and the transcode cache being on nvme)

The current plan was to put the containers and transcode cache on sata SSDs, the OS (ubuntu server) on one of the nvme slots, and make the ZIL/L2ARC be on the NVMe Optane). So, all I would be replacing is CPU/Mobo/RAM/Case.

I have looked for an intel 600/700 motherboard that supports ECC RAM and have found a couple but the run it in Non-ECC mode…which defeats the purpose.

My theory is that you’re currently using relatively inefficient CPU cycles for transcoding of media. Your goal to move to GPU-based transcoding is correct.

I’d look at the new Intel ARC GPUs as an add-on card specifically for their media encoding capabilities. I understand they can encode/decode multiple streams in parallel and even support AV1 codecs that are becoming more prevalent. Not sure what the current software support for this is in Linux (or specifically in your current software setup), but Intel seems to have done a great job here, generally. One of these cards would be way cheaper than a new mobo+CPU+RAM. You could use the savings to improve your current storage setup.

The X470D4U only has 6 SATA slots, so I suppose you have an add-on card to provide additional SATA ports. You also mention a Quadro T400 which sounds like you have a PCIe slot in your mobo you can use for an ARC GPU (as replacement for the Quadro).

Your use case now clearly describes a requirement for multiple concurrent read/write streams on the pool. It would be interesting to hear how many you observe currently and if you would plan to increase in the future. Also, it would be great to get stats about the ZIL and L2ARC usage (you saw my scepticism in my earlier post).

Your existing pool setup isn’t all that great for concurrent access. The performance of a pool is determined by the performance of its main vdevs. Log/cache/special devices help overcome some deficiencies, but ultimately the vdevs determine pool performance. Multiple vdevs allow better support of concurrent operations. Your pool currently has ~28TB capacity (7x 4TB HDDs).

I can imagine that a mirrored pool of 8x 8TB drives would provide slightly higher capacity (~32TB) at better support for concurrent operations.
If your case+mobo supports it, I can imagine adding a few 8TB drives, e.g. for 3 vdevs of RAIDZ1s of 4 drives each. Same idea as above, but much higher capacity.
I can also imagine that adding special devices would take a lot of stress off the HDDs resulting in potentially sufficient performance at current vdev configuration.

Maybe @Exard3k has input once we get some stats from the current zpool?

can you provide the commands to pull that info you want? I tried arcstat but it just showed used and available arc

In a separate post I want to share that for many years I used to have a similar setup (record TV shows and strip commercials and re-encode them) based around mythtv software.

I got out of that a few years ago because it became increasingly harder (technically) and more expensive to access the content I care about. At some point I did not see why I spent the $$$ for commercials to cut. Now, I invest in physical media that I own and rip for convenience into Jellyfin at optimal quality (much better than TV or steaming sources) and pay for few streaming services that provide content not available on physical media (typically sports).

I have the feeling that I am not alone in making such a transition.

If you’d consider such a transition it may solve your current technical issues, too :slight_smile:

First a list of snapshots that provide insight into usage over time:

zpool iostat -v
zpool iostat -r
zpool iostat -w
zpool iostat -q

Then an observation at what you would say represents peak usage of the pool. You can capture this with the same commands for specified time intervals. E.g. for 10sec intervals

zpool iostat -yv 10
zpool iostat -yr 10
zpool iostat -yw 10
zpool iostat -yq 10

Make sure to run these when you feel your system is constrained.

Another way to look at storage performance is the system view using the sysstat tools. The following records observations in 10s intervals for all active drives. It contains insights into wait times and %utilization of drives

iostat -zyxm 10

Same here: Make sure to run this when you feel your system is constrained. Bonus points if you manage to capture data for the same timeframe :slight_smile:

It’s inconvenient in that the data is observed in real-time. There are probably some commands that capture this info from historical data, but I am not aware of how.





At current idle usage (running a scrub)

Thanks.
Scrub runs are metadata intensive (both read and writes). You can see intense 4k req_size action. A special device vdev allows redirecting this traffic to more suitable devices than HDDs.
Not that I would advocate adding special devices for improved scrubbing performance, but it was possible to point out the benefit in the presented data.

Looking forward to more data.

might be a minute. currently upgrading from 22.10 to 23.04

1 Like

This is while running Tdarr scanning and 4 simultaneous cpu transcodes.


Here are logs of the above commands. I sent the output to text files since I assume they are changing over time.

iostat-yq.txt (30.0 KB)
iostat-yr.txt (421.4 KB)
iostat-yv.txt (406.1 KB)
iostat-yw.txt (74.3 KB)

Also uploaded to my pastebin for your viewing pleasure:

And finally, just for fun a zpool iostat -yvw I only pasted a small portion of the log file since it was too large for pastebin. (max 10mb) zpool iostat -yvw breaks it out to each disk itself.

iostat-yvw.txt (14.8 MB)

Perfect. Quite insightful.

Let’s first examine the function of the ZIL and L2ARC devices.

Representitive data sample
                                                     capacity     operations     bandwidth 
pool                                               alloc   free   read  write   read  write
-------------------------------------------------  -----  -----  -----  -----  -----  -----
tank                                               14.6T  14.5T   1000     64   178M   376K
  raidz1-0                                         14.6T  14.5T   1000     64   178M   376K
    wwn-0x50014ee210dcc24e                             -      -    126      8  22.3M  47.6K
    wwn-0x50014ee210d12b30                             -      -    122      8  22.1M  47.6K
    wwn-0x50014ee210ac1176                             -      -    124      8  22.4M  47.2K
    wwn-0x50014ee26632122b                             -      -    124      8  22.1M  47.2K
    wwn-0x50014ee2bb7cc324                             -      -    124      7  22.3M  46.0K
    wwn-0x50014ee266268069                             -      -    125      7  22.1M  46.0K
    wwn-0x50014ee2bb4fbc6e                             -      -    125      7  22.4M  47.2K
    wwn-0x50014ee2bb7c305c                             -      -    126      7  22.1M  47.2K
logs                                                   -      -      -      -      -      -
  mirror-1                                          404K   476G      0      0      0      0
    ata-Samsung_SSD_860_PRO_512GB_S5GBNS0R300634E      -      -      0      0      0      0
    ata-Samsung_SSD_860_PRO_512GB_S5GBNS0R300674W      -      -      0      0      0      0
cache                                                  -      -      -      -      -      -
  scsi-SATA_WDC_WDBNCE5000P_21106K802845            466G  8.94M      0     44    409  38.6M

You basically see no activity on the ZIL devices, there are only very few SYNC operations at all - I assume these to be metadata operations, because cross-referencing to other logs I only see SYNC activity on the RAIDZ1 vdev.

There is no read activity on the L2ARC device. Only the continuing refresh of data in form of writes.

Conclusion: you can remove these devices from the pool without noticing any impact for this observed workload.

Next, let’s examine the stress level of the pool. Can we observe extended wait times?

Representitive data sample
tank         total_wait     disk_wait    syncq_wait    asyncq_wait
latency      read  write   read  write   read  write   read  write  scrub   trim
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
1ns             0      0      0      0      0      0      0      0      0      0
3ns             0      0      0      0      0      0      0      0      0      0
7ns             0      0      0      0      0      0      0      0      0      0
15ns            0      0      0      0      0      0      0      0      0      0
31ns            0      0      0      0      0      0      0      0      0      0
63ns            0      0      0      0      0      0      0      0      0      0
127ns           0      0      0      0      0      0      0      0      0      0
255ns           0      0      0      0      2      0    388      0      0      0
511ns           0      0      0      0      8      4    524      4      0      0
1us             0      0      0      0     12      1    250      4      0      0
2us             0      0      0      0      5      0     59      3      0      0
4us             0      0      0      0      0      0      9      0      0      0
8us             0      0      0      0      0      0      0      0      0      0
16us            0      0      0      0      0      0      0      0      0      0
32us            0      0      0      0      0      0      0      1      0      0
65us            0      0      0      0      0      0      1      4      0      0
131us           0      7      0     43      0      0      2      7      0      0
262us           0     20      0      9      0      0      4      8      0      0
524us         933     11    933      0      0      0      8      6      0      0
1ms            33      0     39      0      0      0     18      0      0      0
2ms            23      0     38      1      0      0     25      0      0      0
4ms            53      0     68      2      0      0     34      0      0      0
8ms           106      5    123      5      0      0     48      2      0      0
16ms          158      6    169      5      0      0     47      4      0      0
33ms          121      9     98      3      0      0     34      7      0      0
67ms           52      7     21      0      0      0     13      5      0      0
134ms          19      0     10      0      0      0      4      0      0      0
268ms           3      0      1      0      0      0      0      0      0      0
536ms           0      0      0      0      0      0      0      0      0      0
1s              0      0      0      0      0      0      0      0      0      0
2s              0      0      0      0      0      0      0      0      0      0
4s              0      0      0      0      0      0      0      0      0      0
8s              0      0      0      0      0      0      0      0      0      0
17s             0      0      0      0      0      0      0      0      0      0
34s             0      0      0      0      0      0      0      0      0      0
68s             0      0      0      0      0      0      0      0      0      0
137s            0      0      0      0      0      0      0      0      0      0
--------------------------------------------------------------------------------

You can observe that most disk reads complete in the 524us bucket, meaning from internal disk cache, then there is another bump of reads around 8ms-16ms, which is the spec time for disk latency. The disks operate at their best.

Representative data sample

` capacity operations bandwidth syncq_read syncq_write asyncq_read asyncq_write scrubq_read trimq_write
pool alloc free read write read write pend activ pend activ pend activ pend activ pend activ pend activ


tank 14.6T 14.5T 1.04K 61 194M 375K 0 0 0 0 0 8 0 0 0 0 0 0
tank 14.6T 14.5T 1.07K 62 198M 359K 0 0 0 0 11 8 0 0 0 0 0 0
tank 14.6T 14.5T 1.03K 65 191M 386K 0 0 0 0 0 0 0 0 0 0 0 0
tank 14.6T 14.5T 993 60 194M 369K 0 0 0 0 153 24 0 0 0 0 0 0
tank 14.6T 14.5T 1.03K 63 193M 391K 0 0 0 0 6 9 0 0 0 0 0 0
tank 14.6T 14.5T 724 62 128M 352K 0 0 0 0 0 4 0 0 0 0 0 0
tank 14.6T 14.5T 1.52K 64 320M 379K 0 0 0 0 0 0 0 0 0 0 0 0
tank 14.6T 14.5T 1009 66 187M 381K 0 0 0 0 0 0 0 0 0 0 0 0
tank 14.6T 14.5T 1.04K 61 197M 362K 0 0 0 0 0 0 0 0 0 0 0 0
tank 14.6T 14.5T 1.03K 64 198M 378K 0 0 0 0 30 11 0 0 0 0 0 0
tank 14.6T 14.5T 1.02K 61 190M 363K 0 0 0 0 33 24 0 0 0 0 0 0
tank 14.6T 14.5T 1.18K 66 296M 407K 0 1 0 0 51 21 0 0 0 0 0 0
tank 14.6T 14.5T 1.47K 65 583M 420K 0 1 0 0 373 24 0 0 0 0 0 0
tank 14.6T 14.5T 1.47K 63 489M 424K 0 0 0 0 19 21 0 0 0 0 0 0
tank 14.6T 14.5T 1.34K 65 427M 410K 0 0 0 0 154 23 0 0 0 0 0 0
tank 14.6T 14.5T 1.53K 64 586M 413K 0 1 0 0 251 24 0 0 0 0 0 0
tank 14.6T 14.5T 1.38K 65 528M 430K 0 1 0 0 8 17 2 6 0 0 0 0
tank 14.6T 14.5T 1.23K 67 349M 408K 0 0 0 0 45 22 13 4 0 0 0 0
tank 14.6T 14.5T 1.50K 38 604M 220K 0 0 0 0 7 11 0 0 0 0 0 0
tank 14.6T 14.5T 2.47K 65 659M 424K 0 2 0 0 11 12 0 0 0 0 0 0
tank 14.6T 14.5T 1.58K 61 595M 437K 0 6 0 0 257 24 0 0 0 0 0 0
tank 14.6T 14.5T 1.83K 64 596M 402K 0 0 0 0 73 20 0 0 0 0 0 0
tank 14.6T 14.5T 1.78K 66 624M 433K 0 1 0 0 34 18 0 0 0 0 0 0
tank 14.6T 14.5T 1.89K 64 586M 407K 0 0 0 0 0 9 0 0 0 0 0 0
tank 14.6T 14.5T 1.67K 68 381M 442K 0 0 0 0 0 1 0 0 0 0 0 0
tank 14.6T 14.5T 1.32K 64 204M 373K 0 0 0 0 0 0 0 0 0 0 0 0
tank 14.6T 14.5T 1.28K 60 203M 365K 0 0 0 0 0 0 0 0 0 0 0 0
`

For this workload, reads dominate (>1k / sample time vs. a few dozen writes). No sync writes, a few sync reads. There is activity in the async read queue, but is doesn’t stay at a high level and the pool seems to be able to work through the queue as this sample data represents the worst that I can see in the data you sent.

*Conclusion: I cannot see anything that looks like the pool is not keeping up. It seems to operate at it’s peak capability. If you think the pool should be faster, you can

  • add spindles.
  • add fast special devices *

Question: how would special devices help with this workload?

Summary

`tank sync_read sync_write async_read async_write scrub trim
req_size ind agg ind agg ind agg ind agg ind agg ind agg


512 0 0 0 0 0 0 0 0 0 0 0 0
1K 0 0 0 0 0 0 0 0 0 0 0 0
2K 0 0 0 0 0 0 0 0 0 0 0 0
4K 0 0 6 0 0 0 36 0 0 0 0 0
8K 0 0 0 0 0 0 9 10 0 0 0 0
16K 0 0 0 0 0 0 0 1 0 0 0 0
32K 0 0 0 0 0 0 0 0 0 0 0 0
64K 0 0 0 0 9 0 0 0 0 0 0 0
128K 0 0 0 0 902 0 0 0 0 0 0 0
256K 0 0 0 0 0 47 0 0 0 0 0 0
512K 0 0 0 0 0 40 0 0 0 0 0 0
1M 0 0 0 0 0 0 0 0 0 0 0 0
2M 0 0 0 0 0 0 0 0 0 0 0 0
4M 0 0 0 0 0 0 0 0 0 0 0 0
8M 0 0 0 0 0 0 0 0 0 0 0 0
16M 0 0 0 0 0 0 0 0 0 0 0 0

`

I’m thinking that this workload is not transcoding, but actively commercial detection. High bandwidth reads and few,small writes.
I assume that your dataset contains media that are nicely stored in large recordsizes (1M) and then some transcoding metadata that doesn’t take up a lot of space in smaller recordsizes. Accessing the small bits will slow down the HDDs but it’s unclear by how much. At the very least the special devices would offload all those metadata sync reads and sync writes to flash.

It would be interesting to assess how much space is consumed by small records (<128k). That would allow to assess the size requirement for a special device vdevs that offload all the small bits to flash.

interesting. That is all Tdarr transcoding. I have a transcoding cache folder on the OS ssd. That folder is currently taking up 78gb during transcodes. Dedup is off, compression is lz4. No comskip running. Strickly transcoding from x264 to x265.

Here is the list of pool properties:

tank  size                           29.1T                          -
tank  capacity                       50%                            -
tank  altroot                        -                              default
tank  health                         ONLINE                         -
tank  guid                           5479968034737282400            -
tank  version                        -                              default
tank  bootfs                         -                              default
tank  delegation                     on                             default
tank  autoreplace                    off                            default
tank  cachefile                      -                              default
tank  failmode                       wait                           default
tank  listsnapshots                  off                            default
tank  autoexpand                     on                             local
tank  dedupratio                     1.00x                          -
tank  free                           14.5T                          -
tank  allocated                      14.6T                          -
tank  readonly                       off                            -
tank  ashift                         12                             local
tank  comment                        -                              default
tank  expandsize                     -                              -
tank  freeing                        0                              -
tank  fragmentation                  0%                             -
tank  leaked                         0                              -
tank  multihost                      off                            default
tank  checkpoint                     -                              -
tank  load_guid                      541453703401732222             -
tank  autotrim                       off                            default
tank  compatibility                  off                            default
tank  feature@async_destroy          enabled                        local
tank  feature@empty_bpobj            active                         local
tank  feature@lz4_compress           active                         local
tank  feature@multi_vdev_crash_dump  enabled                        local
tank  feature@spacemap_histogram     active                         local
tank  feature@enabled_txg            active                         local
tank  feature@hole_birth             active                         local
tank  feature@extensible_dataset     active                         local
tank  feature@embedded_data          active                         local
tank  feature@bookmarks              enabled                        local
tank  feature@filesystem_limits      enabled                        local
tank  feature@large_blocks           active                         local
tank  feature@large_dnode            enabled                        local
tank  feature@sha512                 enabled                        local
tank  feature@skein                  enabled                        local
tank  feature@edonr                  enabled                        local
tank  feature@userobj_accounting     active                         local
tank  feature@encryption             enabled                        local
tank  feature@project_quota          active                         local
tank  feature@device_removal         enabled                        local
tank  feature@obsolete_counts        enabled                        local
tank  feature@zpool_checkpoint       enabled                        local
tank  feature@spacemap_v2            active                         local
tank  feature@allocation_classes     enabled                        local
tank  feature@resilver_defer         enabled                        local
tank  feature@bookmark_v2            enabled                        local
tank  feature@redaction_bookmarks    enabled                        local
tank  feature@redacted_datasets      enabled                        local
tank  feature@bookmark_written       enabled                        local
tank  feature@log_spacemap           active                         local
tank  feature@livelist               enabled                        local
tank  feature@device_rebuild         enabled                        local
tank  feature@zstd_compress          enabled                        local
tank  feature@draid                  enabled                        local

That explains why there are so little writes on the pool.

here is the histogram of filesizes:

  2k:   8449
  4k:   7449
  8k:  10144
 16k:   7492
 32k:   2611
 64k:   1012
128k:    414
256k:    257
512k:     86
  1M:     34
  2M:     26
  4M:     32
  8M:      9
 16M:      9
 32M:      7
 64M:      5
256M:      1
512M:      1

And here is the output of zdb -Lbbbs

Traversing all blocks ...


	bp count:              13444840
	ganged count:                 0
	bp logical:      14046535272448      avg: 1044752
	bp physical:     14023231325696      avg: 1043019     compression:   1.00
	bp allocated:    16106227331072      avg: 1197948     compression:   0.87
	bp deduped:                   0    ref>1:      0   deduplication:   1.00
	Normal class:    16106226950144     used: 50.35%
	Embedded log class              0     used:  0.00%

	additional, non-pointer bps of type 0:       1775
	 number of (compressed) bytes:  number of bps
			 24:      2 *
			 25:      3 *
			 26:      0 
			 27:      0 
			 28:     55 ****
			 29:    113 ********
			 30:      0 
			 31:      0 
			 32:      4 *
			 33:      9 *
			 34:      0 
			 35:      0 
			 36:      1 *
			 37:      0 
			 38:      0 
			 39:      0 
			 40:     37 ***
			 41:     10 *
			 42:     19 **
			 43:      1 *
			 44:      0 
			 45:      4 *
			 46:     11 *
			 47:     25 **
			 48:     10 *
			 49:      4 *
			 50:     11 *
			 51:     11 *
			 52:      5 *
			 53:      5 *
			 54:      6 *
			 55:     22 **
			 56:     32 ***
			 57:     10 *
			 58:      7 *
			 59:      0 
			 60:     49 ****
			 61:     32 ***
			 62:     18 **
			 63:     21 **
			 64:     21 **
			 65:     25 **
			 66:     24 **
			 67:      6 *
			 68:      3 *
			 69:      3 *
			 70:      3 *
			 71:      6 *
			 72:     18 **
			 73:     15 **
			 74:     21 **
			 75:     14 *
			 76:      1 *
			 77:      2 *
			 78:      5 *
			 79:      3 *
			 80:      5 *
			 81:      2 *
			 82:      3 *
			 83:      4 *
			 84:      1 *
			 85:      1 *
			 86:      3 *
			 87:      0 
			 88:      2 *
			 89:      7 *
			 90:     24 **
			 91:    256 ******************
			 92:    573 ****************************************
			 93:     13 *
			 94:     45 ****
			 95:     77 ******
			 96:     23 **
			 97:      4 *
			 98:      0 
			 99:      2 *
			100:      1 *
			101:      5 *
			102:      1 *
			103:      1 *
			104:      1 *
			105:      1 *
			106:      0 
			107:      0 
			108:      0 
			109:      1 *
			110:      5 *
			111:      1 *
			112:      6 *
	Dittoed blocks on same vdev: 32352
	Dittoed blocks in same metaslab: 2

Blocks	LSIZE	PSIZE	ASIZE	  avg	 comp	%Total	Type
     -	    -	    -	    -	    -	    -	     -	unallocated
     2	  32K	   8K	  48K	  24K	 4.00	  0.00	object directory
     1	 128K	   4K	  24K	  24K	32.00	  0.00	    L1 object array
    31	15.5K	14.5K	 696K	22.5K	 1.07	  0.00	    L0 object array
    32	 144K	18.5K	 720K	22.5K	 7.76	  0.00	object array
     2	  32K	   8K	  48K	  24K	 4.00	  0.00	packed nvlist
     -	    -	    -	    -	    -	    -	     -	packed nvlist size
    18	2.25M	  72K	 432K	  24K	32.00	  0.00	bpobj
     -	    -	    -	    -	    -	    -	     -	bpobj header
     -	    -	    -	    -	    -	    -	     -	SPA space map header
   482	7.53M	1.88M	11.3M	  24K	 4.00	  0.00	    L1 SPA space map
 2.53K	 324M	 110M	 401M	 159K	 2.96	  0.00	    L0 SPA space map
 3.00K	 331M	 111M	 413M	 137K	 2.97	  0.00	SPA space map
     6	 404K	 404K	 404K	67.3K	 1.00	  0.00	ZIL intent log
    48	   6M	 192K	 768K	  16K	32.00	  0.00	    L5 DMU dnode
    48	   6M	 192K	 768K	  16K	32.00	  0.00	    L4 DMU dnode
    48	   6M	 192K	 768K	  16K	32.00	  0.00	    L3 DMU dnode
    49	6.12M	 196K	 792K	16.2K	32.00	  0.00	    L2 DMU dnode
    60	7.50M	 424K	1.46M	24.9K	18.11	  0.00	    L1 DMU dnode
 3.41K	54.6M	14.7M	64.5M	18.9K	 3.71	  0.00	    L0 DMU dnode
 3.66K	86.2M	15.9M	69.0M	18.9K	 5.43	  0.00	DMU dnode
    49	 196K	 196K	 792K	16.2K	 1.00	  0.00	DMU objset
     -	    -	    -	    -	    -	    -	     -	DSL directory
    33	17.5K	   3K	  96K	2.91K	 5.83	  0.00	DSL directory child map
     -	    -	    -	    -	    -	    -	     -	DSL dataset snap map
    60	 867K	 216K	1.27M	21.6K	 4.01	  0.00	DSL props
     -	    -	    -	    -	    -	    -	     -	DSL dataset
     -	    -	    -	    -	    -	    -	     -	ZFS znode
     -	    -	    -	    -	    -	    -	     -	ZFS V0 ACL
 3.57K	 456M	14.3M	57.0M	  16K	32.00	  0.00	    L2 ZFS plain file
 16.4K	2.05G	 572M	1.44G	89.5K	 3.68	  0.01	    L1 ZFS plain file
 12.8M	12.8T	12.8T	14.6T	1.15M	 1.00	 99.99	    L0 ZFS plain file
 12.8M	12.8T	12.8T	14.6T	1.14M	 1.00	100.00	ZFS plain file
     1	 128K	   4K	  16K	  16K	32.00	  0.00	    L2 ZFS directory
   779	97.4M	3.13M	12.4M	16.2K	31.12	  0.00	    L1 ZFS directory
 5.04K	59.1M	16.4M	68.0M	13.5K	 3.60	  0.00	    L0 ZFS directory
 5.80K	 157M	19.6M	80.4M	13.9K	 8.00	  0.00	ZFS directory
    30	  15K	  15K	 480K	  16K	 1.00	  0.00	ZFS master node
     -	    -	    -	    -	    -	    -	     -	ZFS delete queue
     -	    -	    -	    -	    -	    -	     -	zvol object
     -	    -	    -	    -	    -	    -	     -	zvol prop
     -	    -	    -	    -	    -	    -	     -	other uint8[]
     -	    -	    -	    -	    -	    -	     -	other uint64[]
     -	    -	    -	    -	    -	    -	     -	other ZAP
     -	    -	    -	    -	    -	    -	     -	persistent error log
     1	 128K	   4K	  24K	  24K	32.00	  0.00	    L1 SPA history
     4	 512K	  36K	 168K	  42K	14.22	  0.00	    L0 SPA history
     5	 640K	  40K	 192K	38.4K	16.00	  0.00	SPA history
     -	    -	    -	    -	    -	    -	     -	SPA history offsets
     -	    -	    -	    -	    -	    -	     -	Pool properties
     -	    -	    -	    -	    -	    -	     -	DSL permissions
     -	    -	    -	    -	    -	    -	     -	ZFS ACL
     -	    -	    -	    -	    -	    -	     -	ZFS SYSACL
     -	    -	    -	    -	    -	    -	     -	FUID table
     -	    -	    -	    -	    -	    -	     -	FUID table size
     1	   2K	   2K	  24K	  24K	 1.00	  0.00	DSL dataset next clones
     -	    -	    -	    -	    -	    -	     -	scan work queue
   141	70.5K	  512	  16K	  116	141.00	  0.00	ZFS user/group/project used
     -	    -	    -	    -	    -	    -	     -	ZFS user/group/project quota
     -	    -	    -	    -	    -	    -	     -	snapshot refcount tags
     -	    -	    -	    -	    -	    -	     -	DDT ZAP algorithm
     -	    -	    -	    -	    -	    -	     -	DDT statistics
     -	    -	    -	    -	    -	    -	     -	System attributes
     -	    -	    -	    -	    -	    -	     -	SA master node
    30	  45K	  45K	 480K	  16K	 1.00	  0.00	SA attr registration
    68	1.06M	 272K	1.06M	  16K	 4.00	  0.00	SA attr layouts
     -	    -	    -	    -	    -	    -	     -	scan translations
     -	    -	    -	    -	    -	    -	     -	deduplicated block
     -	    -	    -	    -	    -	    -	     -	DSL deadlist map
     -	    -	    -	    -	    -	    -	     -	DSL deadlist map hdr
     1	   2K	   2K	  24K	  24K	 1.00	  0.00	DSL dir clones
     -	    -	    -	    -	    -	    -	     -	bpobj subobj
     -	    -	    -	    -	    -	    -	     -	deferred free
     -	    -	    -	    -	    -	    -	     -	dedup ditto
    70	 472K	  48K	 768K	11.0K	 9.84	  0.00	other
    48	   6M	 192K	 768K	  16K	32.00	  0.00	    L5 Total
    48	   6M	 192K	 768K	  16K	32.00	  0.00	    L4 Total
    48	   6M	 192K	 768K	  16K	32.00	  0.00	    L3 Total
 3.61K	 463M	14.5M	57.8M	16.0K	32.00	  0.00	    L2 Total
 17.7K	2.16G	 577M	1.46G	84.4K	 3.84	  0.01	    L1 Total
 12.8M	12.8T	12.8T	14.6T	1.14M	 1.00	 99.99	    L0 Total
 12.8M	12.8T	12.8T	14.6T	1.14M	 1.00	100.00	Total

Block Size Histogram

  block   psize                lsize                asize
   size   Count   Size   Cum.  Count   Size   Cum.  Count   Size   Cum.
    512:  1.60K   821K   821K  1.60K   821K   821K      0      0      0
     1K:    854   984K  1.76M    854   984K  1.76M      0      0      0
     2K:  3.99K  10.5M  12.3M  3.99K  10.5M  12.3M      0      0      0
     4K:  14.9K  59.8M  72.1M  1.61K  8.61M  20.9M      2     8K     8K
     8K:  6.59K  62.8M   135M  1.16K  13.4M  34.3M  7.86K  62.9M  62.9M
    16K:  5.71K   125M   260M  9.59K   167M   202M  18.8K   324M   387M
    32K:  15.8K   691M   951M  2.62K   121M   323M  7.47K   313M   700M
    64K:  4.75K   418M  1.34G  2.72K   240M   562M  17.6K  1.74G  2.42G
   128K:  4.13K   775M  2.09G  25.2K  3.23G  3.78G  4.81K   899M  3.30G
   256K:  5.86K  2.13G  4.23G    811   284M  4.05G  6.19K  2.20G  5.50G
   512K:  33.7K  25.8G  30.0G    381   264M  4.31G  32.6K  27.8G  33.3G
     1M:  12.7M  12.7T  12.8T  12.8M  12.8T  12.8T  12.7M  14.6T  14.6T
     2M:      0      0  12.8T      0      0  12.8T      0      0  14.6T
     4M:      0      0  12.8T      0      0  12.8T      0      0  14.6T
     8M:      0      0  12.8T      0      0  12.8T      0      0  14.6T
    16M:      0      0  12.8T      0      0  12.8T      0      0  14.6T

                            capacity   operations   bandwidth  ---- errors ----
description                used avail  read write  read write  read write cksum
tank                      14.6T 14.5T   723     0 4.31M     0     0     0     0
  raidz1                  14.6T 14.5T   723     0 4.31M     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee210dcc24e-part1               91     0  597K     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee210d12b30-part1               88     0  505K     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee210ac1176-part1               92     0  600K     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee26632122b-part1               88     0  507K     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee2bb7cc324-part1               92     0  597K     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee266268069-part1               88     0  502K     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee2bb4fbc6e-part1               93     0  597K     0     0     0     0
    /dev/disk/by-id/wwn-0x50014ee2bb7c305c-part1               87     0  505K     0     0     0     0
  mirror (log)             404K  476G     0     0 6.29K     0     0     0     0
    /dev/disk/by-id/ata-Samsung_SSD_860_PRO_512GB_S5GBNS0R300634E-part1                                       0     0 3.15K     0     0     0     0
    /dev/disk/by-id/ata-Samsung_SSD_860_PRO_512GB_S5GBNS0R300674W-part1                                       0     0 3.15K     0     0     0     0

This is the interesting piece:

block   psize                lsize                asize
   size   Count   Size   Cum.  Count   Size   Cum.  Count   Size   Cum.
    512:  1.60K   821K   821K  1.60K   821K   821K      0      0      0
     1K:    854   984K  1.76M    854   984K  1.76M      0      0      0
     2K:  3.99K  10.5M  12.3M  3.99K  10.5M  12.3M      0      0      0
     4K:  14.9K  59.8M  72.1M  1.61K  8.61M  20.9M      2     8K     8K
     8K:  6.59K  62.8M   135M  1.16K  13.4M  34.3M  7.86K  62.9M  62.9M
    16K:  5.71K   125M   260M  9.59K   167M   202M  18.8K   324M   387M
    32K:  15.8K   691M   951M  2.62K   121M   323M  7.47K   313M   700M
    64K:  4.75K   418M  1.34G  2.72K   240M   562M  17.6K  1.74G  2.42G
   128K:  4.13K   775M  2.09G  25.2K  3.23G  3.78G  4.81K   899M  3.30G
   256K:  5.86K  2.13G  4.23G    811   284M  4.05G  6.19K  2.20G  5.50G
   512K:  33.7K  25.8G  30.0G    381   264M  4.31G  32.6K  27.8G  33.3G
     1M:  12.7M  12.7T  12.8T  12.8M  12.8T  12.8T  12.7M  14.6T  14.6T
     2M:      0      0  12.8T      0      0  12.8T      0      0  14.6T
     4M:      0      0  12.8T      0      0  12.8T      0      0  14.6T
     8M:      0      0  12.8T      0      0  12.8T      0      0  14.6T
    16M:      0      0  12.8T      0      0  12.8T      0      0  14.6T

As expected: there is only a small amount of data stored in small recordsizes. Almost all data is stored in 1M blocks. Good!

Knowing now that the ZFS setup is working fine, you should identify what component(s) bottleneck your setup. Is it CPU-bound? Is your OS drive limiting the transcoding? Is there a lack of GPU media codec pipelines?

At this point, I still have no indication that your originally proposed upgrades will really improve your setup materially (it may, I just have not seen the data).

I am suspecting it is a cpu limitation. Higher passmark score usually indicates better transcode.
I looked into the Arc GPU and unless I am mistaken, ReBAR is required to use it which the X470D4U does not have. I would have to move to Ryzen 7000 to get ReBAR but even then, there is no unbuffered/unregistered DDR5 yet. And I have yet to find a 700 series or 600 series server board with ECC that is mATX. (at least for a reasonable price).

At the very least I plan to move to that chassis. Unless a better solution can be found.

Everything runs in a Docker Compose stack. I do have XFCE4 installed along with xRDP so I can use a GUI when needed.

I suppose maybe if I were to replace the DDR4 2666 ECC 32GB kit I currently have with a DDR4 3200 ECC kit, that would help a bit. Maybe double the capacity?

And the WD Reds are 5400rpm so maybe replacing with 7200rpm would help.