Thoughts on new Jellyfin NAS

Did you capture output of the iostat -zyxm 10 (no zpool) command under load?
It will show CPU util, IO wait times, active block devices, device util.
Command is provided by the sysstat package in case it’s not installed on your system.

CPU usage was jumping around a lot. I noted in task manager that it seemed like the OOM process was nuking stuff to stay kosher.

I ran 6x Tdarr encode processes along with 3 different Jellyfin streams each set to transcode to a lower resolution and bitrate.

iostat-zyxm.txt (165.1 KB)

As it stands “idle”, now that I have killed all the processes outside of xRDP, RAM usage is around 65%.

I have procured several versions of my disk list for you. So you can see exactly what hardware I have.

lsblk.txt (2.3 KB)
lshw-class-disk.txt (7.1 KB)
parted-l.txt (3.9 KB)
sfdisk-l.txt (8.4 KB)
df-h.txt (5.0 KB)
sfdisk-l.txt (8.4 KB)
lshw.txt (60.8 KB)
lshw-businfo.txt (11.1 KB)

Looks like drive sdc is used as ZFS cache device, sdd-sdk make up the raidz1 vdev. I assume nvme0n1 is the OS drive.

Representative observation
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          38.25    7.76    7.90   17.94    0.00   28.16

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1         15.20      1.32     0.00   0.00   10.70    88.87 1124.50    257.36    22.30   1.94   10.30   234.35    0.00      0.00     0.00   0.00    0.00     0.00    4.10   12.98   11.79  99.84
nvme1n1         74.40      3.40     2.90   3.75    0.50    46.78   14.00      0.14    19.10  57.70    4.99    10.57    0.00      0.00     0.00   0.00    0.00     0.00    1.40    1.14    0.11   7.00
sdc              0.80      0.01     0.00   0.00    0.50    17.00   44.20     38.52     0.00   0.00    3.29   892.35    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.15   9.72
sdd            450.50     42.03     0.00   0.00   11.81    95.53    8.00      0.05     0.10   1.23    9.24     6.35    0.00      0.00     0.00   0.00    0.00     0.00    0.40  104.50    5.44  63.92
sde            453.00     42.44     0.00   0.00   12.33    95.94    8.00      0.05     0.00   0.00   11.93     6.00    0.00      0.00     0.00   0.00    0.00     0.00    0.40  117.25    5.73  62.64
sdf            458.60     42.46     0.00   0.00   11.37    94.81    8.50      0.05     0.10   1.16    9.00     5.93    0.00      0.00     0.00   0.00    0.00     0.00    0.40  124.75    5.34  62.92
sdg            449.30     42.16     0.00   0.00   12.68    96.08    8.60      0.05     0.00   0.00    9.71     6.00    0.00      0.00     0.00   0.00    0.00     0.00    0.40  109.00    5.83  63.24
sdh            448.60     41.69     0.00   0.00   13.06    95.15    7.70      0.05     0.00   0.00   12.84     6.44    0.00      0.00     0.00   0.00    0.00     0.00    0.40  116.50    6.00  63.80
sdi            456.60     42.70     0.00   0.00   11.71    95.76    7.40      0.05     0.00   0.00    8.57     6.43    0.00      0.00     0.00   0.00    0.00     0.00    0.40   98.50    5.45  61.44
sdj            458.40     42.64     0.00   0.00   11.53    95.24    8.20      0.05     0.00   0.00    9.59     5.95    0.00      0.00     0.00   0.00    0.00     0.00    0.40  107.25    5.41  61.16
sdk            455.10     41.91     0.00   0.00   11.42    94.30    7.80      0.05     0.00   0.00    9.92     5.95    0.00      0.00     0.00   0.00    0.00     0.00    0.40  113.50    5.32  61.52
  • %iowait is quite high ranging from ~18% to over 38%. This has significant impact on performance.
  • We can see CPU util being low ( as low as 9% ). Conversely correlating with %iowait values.
  • Correlating with %iowait values are also %idle values (both rise and fall at the same time).

Conclusion is that under load the system is bottlenecked by io capacity.

Let’s look for the most likely culprit:

  • HDD %util can reach as high as 90% (reading at ~80MB/s) but sits mostly lower at ~60% suggesting that at that point the HDDs are not the bottleneck.
  • nvme0n1 is consistently observed close to 100%.

Assuming there are no network drives or other devices causing the high overall %iowait values, I conclude the OS drive is the bottleneck.
The drive performs below expectation. Try trimming this drive

sudo fstrim -a # trim all drives

I assume drives sda/sdb are the ZIL/SLOG drives that are completely unused.

I recommend a test (completely reversible) of removing SLOG drives from tank and building a new pool with two vdevs as temporary transcoding target for testing.

OS drive is nvme1n1
Docker Containers and Transcode parts are nvme0n1

I have removed the SLOG
Now I have one of the 860 EVOs set as the /transcode directory and formatted as btrfs

Though maybe it would help to stripe the two SSDs as the transcode disks.

Agreed. At least while testing.
Now, that we have an idea about what to change we need to figure out what helps.

md127-queue-length
md127-queue-size
md127-requests
md127-service
md127-throughput
md127-utilisation
nvme0n1-queue-length
nvme0n1-queue-size
nvme0n1-requests
nvme0n1-service
nvme0n1-throughput
nvme0n1-utilisation
nvme1n1-queue-length
nvme1n1-queue-size
nvme1n1-requests

nvme1n1-service
nvme1n1-throughput
nvme1n1-utilisation
sda-queue-length
sda-queue-size

sda-requests
sda-service
sda-throughput
sda-utilisation
sdb-queue-length
sdb-queue-size
sdb-requests
sdb-service
sdb-throughput
sdb-utilisation
sde-queue-length
sde-queue-size
sde-requests
sde-service
sde-throughput

iostat-server1.txt (1.2 MB)

The two 860 EVO drives are in a MDADM RAID0

Looks better, right?
CPU utilization is up (~80%). %iowait is down to ~1-2%, only little idle time.
RAIDZ1 drives pull more data/interval and reach a higher utilization (>80%).
The MD RAID SSDs seem to have more capacity running around 60-80% utilization.

I conclude that you managed to reconfigure the system from an io-bound (write) to a (very slightly) CPU-bound system. Probably where you want to be. Looks like a balanced system to me.

If you were to add a lot of CPU capacity, you’d get back to a io-bound system, probably read-bottlenecked. So, if/when upgrading I would look into upgrading both storage and CPU capacity.

The other variant of upgrading would be to add GPU compute capacity that would reduce the need for CPU. It would most likely have a positive effect on power consumption.

What do you think?

I agree. I think I will just get the Node 804 and an Arc GPU. Thank you for helping me save money while “improving” my system. Turns out I did not need a SLOG.

2 Likes

You’re very welcome.

Let us know how your upgrade goes. Folks here always enjoy upgrade stories.

1 Like

now I am stuck dealing with network traffic overloads. I guess it is time to get LACP 802.3ad switch hardware and an I-350 NIC stack.



have confirmed one of the two cables connected is defective (only running at 10/100 on the server side (amber) and (green) on the switch. Though, the built in NIC could be defective too.

Not sure how to help here. I’d make sure that you have flow control enabled on your connections.

problem appears to be resolved but we’ll see.

I replaced all the cables (LAN1 and LAN2 and BMC/IPMI) with CAT6A cables and the lights cleaned up and ethtool reports gigabit connection.

I also setup balance-alb bonding in Cockpit. I am currently testing by transferring several multi-gig files from the server to my nvme. So far, I am seeing 500+ Mbps over the LAN from the server on one NIC and 3 Mbps up from the other.

I am currently seeing transfer speeds in Filezilla of 58.3 MB/s whereas before I was only seeing a max of 10MB/s.

CAT6A really seems to be worth every penny. I bought a 10x bundle of the ultra thin 3-foot patch cables from monoprice on amazon and so far it is Night and Day. I was using flat cables before and I guess they couldn’t hold up.

I’m working on my own Jellyfin NAS as well. I swapped a comparable Ryzen I got from Microcenter with an i5-12400. I think I got to about 20 streams on the i5 transcoding on the local network at 1080p before it bottlenecked. It uses less power than the Ryzen too. I was going to use a A2000 with the Ryzen but didn’t need it after all. The A2000 is now going into another server.

Is ECC necessary? Isn’t transcoding messy and best effort by the hardware anyway?

ECC is very important when it comes to ZFS. Which is how my server is set up.
I used to run a LVM2 RAID6 with SSD caching setup several years ago and I wrote up a guide on how to do just that here. Wendell has referred to it a few times. I feel honored he thought so highly of the guide.

1 Like

@jode I have been doing some digging lately about other ways to shrink my server.
I found some DAS-like enclosures that seem promising and using google-fu it seems like it works with ZFS but I am still interested in the community’s opinion.

paired with a Startech PCIe USB 3.2 2x2 card to get the full 20Gbps bandwith.

combined with maybe this case? https://www.amazon.com/SilverStone-Technology-Precision-Micro-ATX-Compatible/dp/B07VVMP874/ref=cm_cr_arp_d_bdcrb_top?ie=UTF8

I figure a 5.25 bay would be nice for a dvd/bd drive. But i suppose an external usb powered one would be fine too.

Going back to the starting point of this discussion thread, the motivation is to reduce volume of computer boxes.
I am not sure that splitting an existing box into 2 boxes (Sabrent + mATX sized compute unit) will accomplish this.

It will, however, bring potentially a bunch of issues that you didn’t have to live with until that time (introduced by USB): added latency, limited bandwidth, potential issues with connector (port/cable/chipset), USB driver issues.
These are common with external enclosures added via USB. I don’t know of (and don’t expect) such issues with the Sabrent unit you selected, but you should be prepared for it.

what about external mini-sas enclosures? HighPoint has a couple that look good. A little pricy but maybe?

I would consider a mini-sas enclosure technically superior to a USB-connected enclosure. This comes with caveats:

  • USB is ubiquitous and available with any computer. There is a reasonable expectation that a USB enclosure could connect to any type of computer: servers, desktops, laptops.
  • a mini-sas depends on a SAS connector in the server/desktop(/laptop) you want to connect it to. If that computer has an empty 8x PCIe slot you can get a card on eBay starting at ~$30.
  • SAS has the use-case of external enclosures built into its specs and its frequently used in data centers. USB specs it as well, but its rarely used. The reviews for such products often talk about driver and other issues.
  • USB is built with hot-swap in mind and used in this fashion extensively. I would totally expect to turn off/disconnect a USB-enclosure when not in use.
  • SAS devices/enclosures support hot-swap, but their general use case is more permanent. Hot-swap is used as part of maintenance (e.g. swapping out broken disks).
  • last-gen mini-SAS offers better latency and bandwidth (= is faster) than even the latest USB-enclosures. As long as you expect to fill enclosures with hard drives this hardly matters, but its good to know and consider.

I found a relatively cheap mini-SAS enclosure from a reputable brand that seems like a good alternative to the Sabrent USB-enclosure you’re considering.

Still - you need to ask yourself the question: is this really the direction I want to take my computer setup.
Many on this forum (myself included) end up more systematically copying data center setups and get small or large rack mount hardware that we’ll find space for in basements and garages.
Others like that they have a separate enclosure that is reasonably small and quiet and that can be turned off when not in use. USB may be better suited for such a use case than mini-SAS.