ZFS layout recommendation: many JPG's

Thibaultmol · December 18, 2023, 8:50pm

Hi, I’m currently looking into setting up a Geovisio/Panoramax server (an opensource alternative to host self-made streetview like footage).
The data that will be stored are JPG’s usually around 2-5MB.
If for example we had a setup of 8 HDD’s.
What do you think would be the best layout for the zfs pool?

One large vdev RaidZ2
4 pairs of mirrored vdevs

Just like with google maps, you can imagine how streetview footage will be used. Somewhat randomly, I’ll obv try to give it as much cache as possible but i’m still not sure what todo with the actual hdd zfs layout itself.

(trying to keep the post short, but if you need any more info, let me know)

Exard3k · December 18, 2023, 9:10pm

Or two RAIDZ1 vdevs with 3+1. A compromise between the two mentioned configs. I don’t think 2-5MB files have a perfect match when it comes to pool config/ RAID topology. It’s somewhere in between the extremes.

That’s not aweful small random I/O. But usually fine for HDDs to work with. Set recordsize to 1M if the pictures aren’t being modified after writing and set it to 128k or lower if those files get edited a lot.

Which is bad for caching, because unpredictable and outside of often used patterns. RAM+NVMe for L2ARC solves a lot of problems when reading data. But it can’t predict the unpredictable.

But even though you can cache a lot, doesn’t mean you bypass the fact that you are using HDDs. But most frequently/recently used data will certainly be cached, but be prepared to see HDDs at work.

And I think a metadata special vdev can be beneficial as well.

Thibaultmol · December 18, 2023, 9:28pm

iirc I read that the perf difference between raidz 1 and 2 was minimal (and the fact that one large raizd would have more drives might mean it’s faster than two smaller ones?)

Shouldn’t be modified often, good tip, thx!

Have yet to check how exactly the images are stored. I would assume the software stack might generate a set of thumbnails that will need to be accessed first every time, which I might put on mirror vdev of two ssd’s.
That way it wouldn’t be the end of the world if the full res image itself is on the hdd’s. And yeah good call on the metadata vdev. That’s still pretty much ideal for optane right? or are there other options now?

My plan was to buy one of these second hand (not this store, just using it cause they show good pics and spec list):

with a hba330 card that truenas can use (pci-e passthroughed using proxmox to the truenas core vm)

risk · December 18, 2023, 9:29pm

Assuming those 2-5M jpegs get accessed in their entirety…(big assumption). Then obviously you want the record size large to minimize io.

You get about 100iops per drive on 7200rpm drives, you’d do better if they were 1M reads, than if they were 128k reads.

Another thing you could consider for such storage is mergerfs+snapraid. One jpg, one HDD seek.

Thibaultmol · December 18, 2023, 9:33pm

Also a fair point, I hadn’t even considered it tbh, because I was kind of focused on Truenas from an ‘easy to manage, report and follow up/monitor’ aspect.
Plus it makes snapshotting/ remote backup easy.
Don’t personally have experience with mergefs and snapraid. But I def get how it makes sense in this use case. def something to think about