I am considering primocache vs ZFS L2ARC for caching games on my game drive. I have a gaming desktop with a dedicated 10Gb link to my fileserver. The server provides an iSCSI drive for game storage.
I have up to two SSDs available for caching duty:
Samsung 850 Pro 256 GB
Samsung 860 EVO 500 GB (soon to be freed from laptop)
On the ZFS side:
The main storage pool is 4x12TB 7200 RPM Enterprise HDDs in mirrored vdevs
I have ~24 GB of RAM available for ZFS and can’t get more
There are other applications on the server using that same pool, so there is some constant background usage that reduces performance available to iSCSI.
Sustained sequential speed over the 10Gb LAN is ~160 MB/s when the background tasks are running. SSD-only pools for testing move at ~600MB/s under the same conditions, so the 4 HDDs are a bottleneck.
On the primocache side:
I’ve used the trial and it seems to work well and have a tolerable price
I haven’t narrowed this down to primocache, but shutdown time seems to be substantially longer as of around the time I installed primocache. If this is primocache’s fault and can’t be fixed, this stinks because my VR headset is fussy and can require reboots for my GPU to recognize it.
Once the primocache trial expires I think I’ll uninstall it and:
Put the 256GB SSD in the server as L2ARC and see if game load speed becomes unsatisfactory
See if shutdown times improve
If the L2ARC option works fine I’ll probably keep it as-is. However, do any of you have thoughts/suggestions on whether to pick primocache or L2ARC?
24GB ARC just doesn’t cut it when it comes to also caching game files. I consider more memory or a L2 as mandatory for this. You gotta make sure the game files are worthy of ARC space. And considering you got lots of other stuff competing for cache, 500GB-2TB NVMe should do the trick. SATA SSD is too slow (can’t saturate network bandwidth) and 256GB probably not enough. Your game files possibly get evicted from L2 very soon and then you’re back to HDD speeds.
You can always use secondarycache property on datasets to more focus and narrow down your ARC usage across multiple datasets.
Thank you. I should take a step back and clarify a couple things:
I’m not necessarily trying to saturate 10Gb LAN. If performance is like low end SATA SSD most of the time, I would be thrilled.
The 160MB/s figure is without any caching aside from ~24GB RAM for ZFS to use. If I turn off the background stuff I get ~450 MB/s instead. The background stuff is only ~60MB/s so I think the HDD seek time is what’s crippling throughput.
I have maxed out the RAM in my server (32GB, minus ~8GB for OS), so I can’t get more RAM
At the moment there is only ~280GB of games in my iSCSI drive. 256GB of cache isn’t comically undersized but of course, more is better and I may install more games down the line.
So given all that, does 1-2 SATA L2ARC devices have a nontrivial chance of being a tangible improvement over no L2ARC at all? If it’s not likely to help I’ll try to solve the shutdown time issue with primocache instead because performance using the 256GB SSD as a cache is satisfactory.
As a partially related note, how much wear should I expect on my L2ARC devices? I know the default feed rate is ~8MB/s but I don’t know how often it will feed at the max rate. I assume I will have to restrict the background stuff’s access to L2ARC because the ADS is a few TB, so it could make the L2ARC feed much more than if it’s just holding metadata and game data.
Having no L2ARC with only 24GB of ARC and 4x 12TB drives was a mistake in the first place. Cache is vital for ZFS. If you dont have the RAM, you complement with Flash. Or you’ll see that horrible performance because most stuff is fetched from HDD and get get bottlenecked by MFU/MRU that’s usually handled at memory speeds. If you spend 1200-1500$ on HDDs, you can spend a hundred on proper caching. Will solve all problems. Get a TB or two.
Oh and enable persistent L2ARC so you don’t start with a cold cache on reboot. This being done, there is basically no wear on the SSDs. ZFS knows what to keep and what to evict given enough space to breathe.
edit: L2 feed rate can be tweaked by tunables…but I didn’t have any issues with that on my TrueNAS. This mechanic was more of a problem before L2-persistence.
What volblocksize would you use for the zvol? I use 64k right now. If this ZFS talk (slide 55) is still current, I lose 256 bytes of RAM per block of data in L2ARC. 1 TB of L2ARC would eat 4GB of RAM, and I don’t have much to spare.
I could make a new zvol with bigger blocks. What size would you suggest? If I assume the following, then I don’t see why huge blocks like 1M wouldn’t be fine:
ZFS compression should eliminate the issue of files smaller than the allocation size being rounded up, wasting space. The only waste would be some LAN bandwidth if a game being played has many teeny files.
Game loading should be mostly sequential from contiguous sections much larger than the allocation unit.
Have I overlooked something? Hypothetically, with 1M blocks and 1TB L2ARC, I would only be giving up 250MB of RAM instead of 4GB.
Smaller record-/blocksizes vastly increase ARC bookkeeping/headers, that’s true. But 1000+20 > 24. I doubt you get 4GB of headers with 64k unless you got several other 4k-16k datasets using rest of your space . See this graph:
64k for zVol seems high. I’m not an expert on this, so I pass. But I use 16k ZVOLs myself which is recommended default by TrueNAS. And I don’t even know what that ZVOL is for. I prefer NFS shares whenever possible because datasets are more flexible.
That’s not true in my experience. Fetching game files has both sequential reads as well as a whole bunch of random reads. SSDs aren’t preferred because of their throughput, but because of their latency over HDDs when it comes to loading games. If you check iostat or even htop during loading, you can see exactly that.
Aren’t games basically tuned to load from CD/HDD at the moment? Designed to load game assets into ram (loading screen) and then play without much disk activity?
Not sure a cache will do much, unless you specifically pre-load the game assets into it.
This might change now PS5 has Ballin storage, but we will see.
The L2ARC is not user populated, so you would not be able to tell it which assets to load, unless you loaded a level several times to warm the cache.
Not used primocache, but if you can move one game at a time to it, then that might be an idea.
Personally I just set a small flash array of sata SSD’s, and move games there when active. Doesn’t do much apart from reduce loading screens, and that only really helps when jumping around cells in an open world game with fast travel. Linear games, or arena games don’t benefit as much. Multiplayer games would rely on opponents loading in, so no advantage even flash over rust…
Optimal Zvol block size is definitely a tricky question. I personally consider 16K to be a sane minimum, but I’ve seen reference to using up to 256K in enterprise. I’d consider 128K to be a reasonable max to try out. I personally have started to consider setting block size to 2x the size of the expected write size to be the sweet spot for reasons outlined below, with caveats/additional tuning.
The following is from what I’ve seen in discussions among “paid to optimize ZFS” admins, so I’m not the most reliable reteller and probably won’t be able to satisfactory answer “but what about X…” questions.
Higher block size will result in more RMW amplification, and as such reduce performance off the bat in the benchmarks most laypeople try.
However, the higher the activity on the drive the higher eventual resulting fragmentation of free space, making it harder for ZFS to find places to write to. ZFS might be looking for 1M of continuous space when most of the available spaces are less than that. If it can’t find space, it has to create “gang blocks” which are awful to have and result in file fragmentation. Obviously this mostly becomes a problem when the drive is getting close to full. In such cases, larger block size will perform faster and more consistently than small blocks. Essentially this trades some up front performance on a fresh array in exchange for less eventual fragmentation and thus mature array performance. All the benchmarks you see won’t ever show this. This is going to be much more significant for HDDs compared to SSDs, but the problem doesn’t go away with SSDs.
Highly active databases are an ideal example of this, but VMs can sometimes (not alway) benefit as well.
Larger zvol blocks can also come out ahead when you increase the amount of memory and time ZFS can use to coalesce writes. Chances are that writes are happening in a way that is very “local” and this can end up optimized by combining writes if they can sit in ram long enough, which can eliminate a lot of any extra write amplification.
Datasets with limited recordsize and database type of files also benefit similar to zvols.
I haven’t worked through how well compression is going to behave in saving space in various situations of larger zvol blocks, various files sizes and file system cluster sizes.
Yes, really what I’m aiming for is shorter loading screens. For games that rely on fast NVME, I’ll just install them to my boot NVME drive.
The features of primocache I’m using are very similar to L2ARC at a very high level. A separate SSD is used to cache a 2nd copy of some data in a filesystem that the software decides should be cached. As the cache fills, it evicts data it expects to be used less. After you launch a game a few times, the key files are copied to the cache and you get SSD-like speed.
The desired behaviour is to automate what you’re doing manually, where the games I’ve been playing lately are cached once I’ve launched it a couple times recently and thus are probably going to keep playing it for awhile. Primocache seems to do this well. If I only enable L2ARC on the zvol that my games are on (presented to windows as iSCSI, formatted NTFS), then the performance should be similar as long as L2ARC isn’t tuned dramatically differently.
Thanks, what I’m hearing is still that while the need for huge blocks isn’t obvious if you can handle the metadata/ARC footprint, the penalties for doing so aren’t either.
If it’s not a given, then I will test my theory about zfs compression helping with large allocation size wasted space.
Do you have a link to the writeup for that table? I would like to read about the test conditions.
If I’m buying a TB or larger cache drive and primarily using it for game caching (my server workload is happy as-is), is there any point in not just putting the huge SSD in my gaming rig directly? My installed game library would probably fit on TB or larger drives. Another consideration is that my server has no free PCIE slots (so I may be limited to SATA) while my gaming rig has one spare M.2 slot.
I’ve tried both, primocache works better for games. Primocache will save all data which is read or written to cache the firsr time it is read or written while L2arc limits cache writes so only some of the data is written to the cache each time it is accessed.
I’m ok if writes aren’t immediately in the cache, since installing many large games would evict a lot of cache unnecessarily.
Do you know if L2ARC would cache most things that are read a few times in a short period if I modified the tunables? For example, a high max feed rate might let L2ARC cache new games after I launch them a couple times, like primocache does.
I looked into the man pages because of your post (I also have a pool without cache that has games on it) and it seems persistent L2ARC is already enabled by default or is there a difference between cache present when the pool was created and one that was added later?
I don’t know who told you guys having pools without cache is a good idea. NAND Flash and memory are king of performance. Unless you have hundreds of GB of memory (or your usual working set fits into the small memory), always extend your memory with L2, it’s dirt cheap nowadays.
Check your tunables if the rebuild is set to 1 or 0 (vfs.zfs.l2arc.rebuild_enabled. On my TrueNAS, it certainly wasn’t and I haven’t seen any default config where this was the case.
There is no test involved. The ARC headers use a fixed amount of bytes per record/block. That’s why you see a 100% linear correlation to recordsize/blocksize. It’s just x amount of bytes times the number of records you got.
The sentiment I got when reading up on it for my NAS and PC pool was that L2ARC was a last resort for some esoteric workload that would chew through my RAM more than the normal ARC. I have a 500GB SSD here so I as far as I understand it now I can easily put that in as a cache, not as large as it could be for 8TB but should help, no?
I think this is a difference in defaults between TrueNAS and openzfs, the latters man pages imply that the option is set to 1 and that contrary to what the TrueNAS docs say the cache gets rebuild async, as in not blocking the booting process.
Well the ARC is the core of ZFS. It stores many things, but most famously metadata + MFU/MRU data blocks. If you got e.g. 32GB worth of data you don’t want to be fetched from slow HDDs, you’re fine. But in practice you usually have more than this. You either get a cache hit on those 32G at memory speed or read from HDD (slow except for sequential stuff).
L2 offers another storage tier where important data is kept that doesn’t fit into the ARC itself anymore , but still is data that is recently/frequently used. L2 cache hit isn’t memory speed, but any cache hit from a SSD is much more favorable than reading it from HDD, especially if those are small and medium files.
Two things to keep in mind: SATA SSDs can’t keep up with sequential reads from multiple HDDs, so make sure your L2 isn’t slower in that regard. ARC bookkeeping overhead (as mentioned above) can be crippling when on low record-/blocksize and limited memory. Metadata percentage is also inflated in that scenario. But for 98% of all pools, L2ARC is just great and way cheaper than increasing memory.
I did check on my Kubuntu laptop and it isn’t enabled.I got no L2 on a laptop (all NVMe), but I certainly didn’t change it. But it’s always good to check that setting. Warming up cache can be a no-go especially if you shutdown/reboot often on your pool. Which is not so important for the datacenter, but we’re talking mostly homelab here.
I made another zvol with the same data as my 64k zvol that I formatted with NTFS (allocation size = zvolblocksize) for game storage. I used a 1M volblocksize as an extreme example. One of the games installed was ARK: Survival Evolved which has ~100,000 files, many of which are between 1kB and 10MB. This should aggravate any issues with NTFS wasting part of a block for each file. Below is the zfs list output:
NAME USED AVAIL REFER MOUNTPOINT
p/dat/iscsi-hafxb-1M 257G 11.1T 257G -
p/dat/iscsi-hafxb-64k 274G 11.1T 274G -
Both zvol’s have zstd compression on, but lz4 shouldn’t be appreciably different here. It looks like as long as you have zfs compression, you can use huge allocation & block sizes with NTFS on a zvol without increasing the storage needed on disk. When the blocks are decompressed and sent over iSCSI, the waste from large allocation sizes will eat into your bandwidth, but for game storage it shouldn’t be too bad, especially if you don’t play ARK which has a particularly large number of small files.
I’m still not sure how I would arrive at an optimal block size without a lot of fiddling and testing, but it looks like NTFS wasted space isn’t something that is relevant when picking this for a game drive.