Home NAS latency (SOLVED)

For fun I have set up a small home NAS/SAN and was looking for suggestions on how to reduce latency. General specs

  • z170 Asrock Motherboard
  • 16GB Ram 2133
  • 3 10TB WD Red Raidz
  • 3 3TB (CMR) WD Red Raidz (cold storage archive, removed when not backing up)
  • Gigabit onboard ethernet (Realtek I believe)
  • Gentoo Linux (I’m aware there are better distros, this is what I use everywhere)

I am using ZFS with NFS exports to share general media to my machines, and a directory to back up phones ect to. I’m fully saturating the gigabit link on large files/binary streams but for large directories of small files, I’m hitting latency issues that are pretty bad. From nfsiostat I get this result on my largest pool when transferring a ton of small photos.

EDITED: Latency while not scrubbing

read:              ops/s            kB/s           kB/op         retrans    avg RTT (ms)    avg exe (ms)  avg queue (ms)
                  23.655        2841.696         120.129        0 (0.0%)          37.476          37.509           0.014
write:             ops/s            kB/s           kB/op         retrans    avg RTT (ms)    avg exe (ms)  avg queue (ms)
                   0.000           0.000           0.000        0 (0.0%)           0.000           0.000           0.000

Other things I’ve considered are going with 10G sfp+ with DAC cables as I could use the bandwidth anyway, and I know that’s lower latency. Please let me know if I missed anything in this brain dump, or you want further information on this network/machines.

EDIT: I somehow missed that the system was scrubbing. I’ll edit with more accurate results on that pool.

EDIT2: Quick notes. I have 3 copies of my data. Online, offline, and offsite. I could care less abut my live data as I have proper backups. I’m not going to ignore all risks for speed as getting my data is inconvenient, but it is securely backed up, so some risks are acceptable in a home setting with proper backups.

EDIT3: This has been solved thanks to everyone offering ZFS tuning and adding an optane slog.

1 Like

You can get more performance by using mirrors instead of RAIDZ. In fact this is the preferred method.

https://jrs-s.net/2015/02/06/zfs-you-should-use-mirror-vdevs-not-raidz/

1 Like

Can you get tuned-adm on gentoo or is that just a rhel thing?

ZFS isn’t too latency friendly in general. You could add a zippy slog to help with write iops and latency. I don’t remember off the top of my head, but there are tunables to disable forced cache flushing or extend the time before it flushes. I believe the default is 5 seconds.

That will definitely take a toll.

+1

More vdevs

1 Like

While I would like to agree with this route, building my array of disks I was able to afford 3 disks hence the raidz. I couldn’t afford to not have the space going with 2, nor the 4th disk unfortunately. 95% of my storage needs are media/backups, so I’m playing with the performance aspect to see what I CAN do VS actually needing it. My main “server” that I run docker containers on is a lowly chromebox with 6GB ram and a celeron, so my true needs are quite low and probably put the general cost into picture.

It appears I may be able to, though it’s designed for systemd, so I’d have to adjust it as I run openRC. Yeah, I’m gonna by THAT guy :slight_smile:

I actually purchased a small optane drive (The M80 I believe, the M.2 kind) and it’s physically in the machine but not put to use yet as I am not sure if that’s direct to nand or a cache flush. If it’s a flushing drive, they aren’t battery backed, so that’s a bad idea.

A waste (IMO). The performance of a mirror with 1 hot spare and have 2 fold better IOPS, with faster resilvering when a drive dies would be more worth in the short and long term.

Okay but how full is it? The general rule of thumb is to have no more than 80% utilized space. If you are at 95% then ZFS is freaking the hell out because it was not designed to really go that full gracefully.

Something I’ve done on my NAS which is running Linux and btrfs for the filesystem, is to adjust the VFS cache and try to keep the file / inode information in RAM.

I’ve got these two lines in /etc/sysctl.d/91-swappiness.conf

vm.swappiness=10
vm.vfs_cache_pressure=25

And I periodically run a du /home >/dev/null to load the cache. I think it improves small file access by a lot, but your results may vary.

I’m also not sure how it will work with ZFS, because doesn’t that have its own ARC thing for caching file data instead of the Linux VFS cache?

1 Like

Yes. ARC is a cache of the last frequently used files and it resides in RAM. L2ARC on an SSD is also a waste if resources are tight because it doesn’t do much for a home NAS because the RAM maintaining the L2ARC could be used instead for the regular ARC.

1 Like

10TB 3 disk mirror + hot spare = 10TB
Same disks raidZ = 20TB

Sorry, storage NEEDS. It’s a 20TB pool and I have 13TB free

I have set swappiness to 0 to prevent spillage into swap unless under extreme pressure which almost never happens. I could try to force files into cache, though it’s sporadic enough that I can’t really predict what I need in arc, se I’m at the mercy of physical disks.

This is also why I bought my optaine drive, and ended up not using it. l2arc is likely a waste of my ram.

Low swappiness is a good idea, but yeah I’m not sure if any vfs values will affect zfs. There are zfs tunables that do roughly equivalent things though.

For general Linux tuning, I’d try to dig into what tuned-adm does under the hood with the performance-latency profile and replicate that on Gentoo (or install tuned-adm if possible).

Without a space drive you are sunk anyway. The capacity means nothing if you loose it all when a drive dies and the array cannot resilver.

A 10 TiB drive will take about 2-3 days to resilver and 4 years from now when your drives are worn out if you have cycled them regularly then you are asking for trouble.

If you are using 7 TiB then yeah you would need to have more drives in your pool as the 50% storage hit does suck but thats the only real downside to mirrors.

That’s good! Optane is awesome. However, only 1 though? If that dies your array will also be trashed. Which is why you’re supposed to mirror those twice or thrice over.

Yeah. He could still look into running a periodic job to pull in the file info though. Use cron or systemd or whatever. Once all of that is cached it doesn’t need to be read from disk again, and file metadata is the slowest part of small file access.

If you don’t mind taking on some risk, you could use the optane as slog and disable the cache flushing. Just be sure to have smartmontools configured with email alerts so you’ll get notified of any pre-fail symptoms.

Worst case there is you lose recent writes if the optane dies.

2 Likes

No single pool is critical to me at all. I added the raidz for uptime or I would have just gone with a stripe. I keep offsite backups, and online backups on seperate pools so I have 3 copies. If I lose the data in the middle of backups, life happens. That also is where a lot of my budget goes and why pools have low storage per pool as they are not in the same physical box or even location. Also note that I live alone, and am the sole user of this data, so anyone else’s access/my monitory loss is no concern.

With the known risks to the live data explained as not a real issue, I was intending to try it out as a ZIL as a single drive. If the performance improved, and it’s a direct to nand write as they aren’t battery backed, I would get a second one as a mirror as you are correct, that is best practice.

1 Like

I agree. Gotta be prepared for failure when it happens.

For example, we had a hypervisor die the other day and had we not setup proper replication there would have been a service impact to our customers.

You might not see much improvement out of the box with slog. In my experience, it usually needs some tuning.

You also might see some latency improvement with compression off. LZW is cheap, but it’s not free. There are cases where it speeds up I/O for highly compressible data, but I think those are really specific use cases (logs maybe).

Actually, you get better I/O with LZ4 compression turned on, even for uncompressable data IIRC. That JRS guy is a genious.

@kdb424 could you give us some of your pool’s details?

zfs get all pool_name

tank  type                  filesystem             -
tank  creation              Fri Dec 27 18:15 2019  -
tank  used                  4.27T                  -
tank  available             13.3T                  -
tank  referenced            234K                   -
tank  compressratio         1.05x                  -
tank  mounted               yes                    -
tank  quota                 none                   default
tank  reservation           none                   default
tank  recordsize            128K                   default
tank  mountpoint            /mnt/data              local
tank  sharenfs              off                    default
tank  checksum              on                     default
tank  compression           off                    default
tank  atime                 on                     default
tank  devices               on                     default
tank  exec                  on                     default
tank  setuid                on                     default
tank  readonly              off                    default
tank  zoned                 off                    default
tank  snapdir               hidden                 default
tank  aclinherit            restricted             default
tank  createtxg             1                      -
tank  canmount              on                     default
tank  xattr                 on                     default
tank  copies                1                      default
tank  version               5                      -
tank  utf8only              off                    -
tank  normalization         none                   -
tank  casesensitivity       sensitive              -
tank  vscan                 off                    default
tank  nbmand                off                    default
tank  sharesmb              off                    default
tank  refquota              none                   default
tank  refreservation        none                   default
tank  guid                  16067106483522758241   -
tank  primarycache          all                    default
tank  secondarycache        all                    default
tank  usedbysnapshots       0B                     -
tank  usedbydataset         234K                   -
tank  usedbychildren        4.27T                  -
tank  usedbyrefreservation  0B                     -
tank  logbias               latency                default
tank  objsetid              54                     -
tank  dedup                 off                    default
tank  mlslabel              none                   default
tank  sync                  standard               default
tank  dnodesize             legacy                 default
tank  refcompressratio      1.00x                  -
tank  written               234K                   -
tank  logicalused           4.49T                  -
tank  logicalreferenced     78.5K                  -
tank  volmode               default                default
tank  filesystem_limit      none                   default
tank  snapshot_limit        none                   default
tank  filesystem_count      none                   default
tank  snapshot_count        none                   default
tank  snapdev               hidden                 default
tank  acltype               off                    default
tank  context               none                   default
tank  fscontext             none                   default
tank  defcontext            none                   default
tank  rootcontext           none                   default
tank  relatime              off                    default
tank  redundant_metadata    all                    default
tank  overlay               off                    default
tank  encryption            off                    default
tank  keylocation           none                   default
tank  keyformat             none                   default
tank  pbkdf2iters           0                      default
tank  special_small_blocks  0                      default

dataset that actually holds the data that I’m trying to improve

tank/scratch  type                  filesystem             -
tank/scratch  creation              Wed Apr 29  6:27 2020  -
tank/scratch  used                  113G                   -
tank/scratch  available             13.3T                  -
tank/scratch  referenced            113G                   -
tank/scratch  compressratio         1.00x                  -
tank/scratch  mounted               yes                    -
tank/scratch  quota                 none                   default
tank/scratch  reservation           none                   default
tank/scratch  recordsize            128K                   default
tank/scratch  mountpoint            /mnt/data/scratch      inherited from tank
tank/scratch  sharenfs              on                     local
tank/scratch  checksum              on                     default
tank/scratch  compression           off                    default
tank/scratch  atime                 on                     default
tank/scratch  devices               on                     default
tank/scratch  exec                  on                     default
tank/scratch  setuid                on                     default
tank/scratch  readonly              off                    default
tank/scratch  zoned                 off                    default
tank/scratch  snapdir               hidden                 default
tank/scratch  aclinherit            restricted             default
tank/scratch  createtxg             1607865                -
tank/scratch  canmount              on                     default
tank/scratch  xattr                 on                     default
tank/scratch  copies                1                      default
tank/scratch  version               5                      -
tank/scratch  utf8only              off                    -
tank/scratch  normalization         none                   -
tank/scratch  casesensitivity       sensitive              -
tank/scratch  vscan                 off                    default
tank/scratch  nbmand                off                    default
tank/scratch  sharesmb              off                    default
tank/scratch  refquota              none                   default
tank/scratch  refreservation        none                   default
tank/scratch  guid                  513176425129963498     -
tank/scratch  primarycache          all                    default
tank/scratch  secondarycache        all                    default
tank/scratch  usedbysnapshots       0B                     -
tank/scratch  usedbydataset         113G                   -
tank/scratch  usedbychildren        0B                     -
tank/scratch  usedbyrefreservation  0B                     -
tank/scratch  logbias               latency                default
tank/scratch  objsetid              840                    -
tank/scratch  dedup                 off                    default
tank/scratch  mlslabel              none                   default
tank/scratch  sync                  standard               default
tank/scratch  dnodesize             legacy                 default
tank/scratch  refcompressratio      1.00x                  -
tank/scratch  written               113G                   -
tank/scratch  logicalused           112G                   -
tank/scratch  logicalreferenced     112G                   -
tank/scratch  volmode               default                default
tank/scratch  filesystem_limit      none                   default
tank/scratch  snapshot_limit        none                   default
tank/scratch  filesystem_count      none                   default
tank/scratch  snapshot_count        none                   default
tank/scratch  snapdev               hidden                 default
tank/scratch  acltype               off                    default
tank/scratch  context               none                   default
tank/scratch  fscontext             none                   default
tank/scratch  defcontext            none                   default
tank/scratch  rootcontext           none                   default
tank/scratch  relatime              off                    default
tank/scratch  redundant_metadata    all                    default
tank/scratch  overlay               off                    default
tank/scratch  encryption            off                    default
tank/scratch  keylocation           none                   default
tank/scratch  keyformat             none                   default
tank/scratch  pbkdf2iters           0                      default
tank/scratch  special_small_blocks  0                      default

I’ve also edited the first post with correct numbers while I am not scrubbing.

You should enable lz4 compression.

But you will need to rewrite all of your current data as zfs does not do this for you. The easiest way is to just pipe your dataset.

There are also various other bits like disable atime and set the ashift value.

There is a tuning article by the JRS guy as well.

Turn atime off

1 Like