Best practice for raidz on differnt sized disks in 2023?

cloudkicker · May 13, 2023, 7:03pm

Awesome. Thanks for the advice. Trying to squeeze everything I can out of these sas drives.

So maybe I can get by with:

Raid1 (2 disk) for proxmox
Raid1 (6 disk) for Storage pool.
1 SLOG (I can either give it a 256, 512, or 1tb m.2) Is there diminishing returns on the size of the drive for SLOG?

Exard3k · May 13, 2023, 7:05pm

You only need 5 seconds worth of writes as the capacity. so 8GB is fine. This will basically convert all sync writes into async writes and turn random I/O into sequential IO. So for HDDs you may see an increase from 5MB/s to 100-200+MB/s, depending on the random write speed of the M.2.
You’re still limited in write speed by RAIDZ though.

cloudkicker · May 13, 2023, 9:17pm

Appreciate the help. I may wait until I get through this capstone before blasting away the build. I don’t want to have to setup the domain again.

gcs8 · May 16, 2023, 6:42am

OK, so I am going to assume that your SAS drives are not SSDs, so the best bet is striped mirrors (raid 10), then I would use 2 of the optanes for a ZIL, unless they are crap optanes, you have to remember that when you add a ZIL that now becomes your bottleneck, if your pool is fater than your ZIL you will be sad.

I am not sure how much ProxMox hit’s it’s root disk, but if it is just some generic logs and configs and not a lot of always traffic, I am a huge fan of good SATA DOMs, looking at the DL380 I see it has an SD card slot, it would be a bit crap, but you could use a high endurance SD card and a good quality USB drive for a mirror, but if proxmox is cool and will let you use the root pool for VMs and what not, I would just put all 8 SAS disks in the striped mirrors root pool and go on about my day. (not a huge ProxMox person)

Also keep in mind what @Exard3k said, you only need to keep like 1 transaction group in the ZIL at any given time, so size is kinda meh unless you do something crazy like set your TXGs to be 30 sec apart or something. The ZIL also does a bit of its own tricks as shown below.

As far as how much space you need, if this is all local and no network traffic, then it only needs to be like 2x the total writes you can shove through it in whatever the TXG timer is set to. The real kicker here is endurance, throughput, and how fast the ZIL disks can ack a write, hence why I use optane for a ZIL. Though I use the crazy nice stuff.

As far as adding another CPU, more NUMA nodes can be sad, but it usually fine unless you are making super huge VMs, more RAM is always a plus, but you will need to balance your ARC vs system/VM required RAM, the last thing you want is paging.

As far as AC Hunter CE goes, it doesn’t look like the requirements are going to be crazy, I log a crap ton of stuff with Splunk at home and even more at work (~100GB/day) but that only works out to like 4.167GB/hour or 69.44MB/min, then you add compression to that as it’s mostly text and now you are writing a lot less total data to disk, if you are feeling frisky, you can use ZSTD-3 for compression, it does like to eat up CPU cycles though, so maybe only do it on the dataset for that one data disk. Writes should get soaked up in a TXG if it’s an async write, or the ZIL for sync writes, unless you really really care about the data and set the dataset to sync=always then every write will go through the ZIL. I would probably just suck it up for a better overall pool experience than having to wait an extra few seconds to pull up a large dataset.

I am also curious what optane you plan on using, if it’s like the good optane or is it like the throwaway stuff they were putting in desktops to augment a spinning HDD and only have ~100-200MB/s of throughput?

As a side note @cloudkicker, you could do a ZFS send to a remote/temp box to just sit on the data until you are ready to move it back. Or if ProxMox has some export option or something. I have gotten spoiled with ESXi and multiple storage targets that I can just do a storage vMotion and go on about my day. You might also be able to just back it up to a NFS target? Too lazy to google it.

Exard3k · May 16, 2023, 7:14am

Regularly for a couple of kb. Pretty much your default debian with a couple more logs. Only real load is if you have VMs with I/O running on that root disk.
Really any cheap SSD or proper USB stick will do for Proxmox.

It is.

And you can create that pool at install. Default is actually “all the drives”. So very easy, no CLI no nothing.

Don’t call me crazy! txg_timeout=120 here But I have 13 tunables running, so I’m not a reference in that regard.

Should be fine on a 10-core Xeon. Switching back to LZ4 is easily done if CPU load proves to be too high. We’re talking about 1.2T HDDs after all, there aren’t GB/s for ZFS to compress.

cloudkicker · May 16, 2023, 12:35pm

2 x Optane Memory H20 SSD 1TB + 32GB cache
1 x Optane H10 SSD 512 GB + 32GB cache

I was planning on just using the 512 as it was the smallest.

I also forgot that I had opted for the 12 core xeon over the 10 originally. I saw how cheap xeons/ram were since my last post so decided to upgrade to 2 xeons, 1200w psu, and 64 gb of more memory.

I decided to go with proxmox because it was something I had never used but I do really like ESXi since I use to manage it quite a bit 5 years ago. I may go back to ESXi since it has been a while and would apply to work more.

gcs8 · May 16, 2023, 6:25pm

Cool, that would be my suggestion then, I mostly only ZFS on FreeBSD, but I have been playing with it on Ubuntu 20.04 as part of the OSNEXUS Quantastor distro for my HA SAN. No real complaints.

That must have changed in the last few years, I tried to run a cluster of 16 nodes of ProdMox in an HP C-7000 and it was just crap.

I mean, if you are doing like a huge NVMe pool or something I have seen that done, I don’t know what your pool is like.

True, but even my spinning rust system is hit pretty hard and it has 40c/80t. It is a storage only box though, so I don’t mind.

Hmm well if I decoding Intel Optane Memory H20 with Solid State Storage 32 GB 1 TB M.2 80mm PCIe 3.0 3D XPoint QLC Product Specifications correctly, seems a bit sus that they say “Random Read/write (8GB Span) (up to)”, so I am not sure if that is a DRAM cache, or a chunk of SLC or what. However, 8G is more than enough for a ZIL, so thats fineish.

Maybe hold onto that H10 as a throwaway drive for a data disk only for AC Hunter CE if the pool can’t keep up, I would not bother with a L2ARC until you can’t fit anymore RAM in your system.

Sweet, yeah the E5-2673 v4 I run have gotten cheap and the 32G LRDIMMs have gotten cheap as well. Go nuts with what fit’s in your budget, just don’t treat it like a race car that you daily, that gets expensive.

Nothing wrong with playing with different platforms, and honestly if you are not going to multi-node clusters and some of the fancy stuff that is in ESXi + VCSA, ProxMox might be a better fit, it also avoids the dumpster fire that is VSAN. The local disk management in ESXi is crap, it is meant to be hooked up to a SAN/NAS. If you don’t have one, I would not go down that road, it gets expensive.

system · February 14, 2024, 12:26pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.