[SOLVED] - Help with Proxmox ZFS Best Hard Drive Configuration

jjquin · July 20, 2018, 4:57am

I am working on setting up a Proxmox server for my home/family. The hardware is as follows

Used Intel Server with:
Dual Intel Xeon E5560 Processors
16GB ECC Memory (plan on doubling this around Christmas)
Two 250GB Samsung 850 SSDs
Two 8TB Seagate Archive HDDs
Intel 4-port PCI-e Network Card (PCI Passthrough to pfSense)

I plan on having 1 pfSense VM that will be a backup of my current QOTOM Mini PC pfSense box. The server will also have 8-10 Debian LXCs: Nextcloud, Emby, Transmission, SMB/NFS File shares, Searx, Graylog, Zabbix, Syncthing, Apt-Cacher-ng.

I have several questions on the best way to setup the hard disks. I plan on running ZFS.

I want to encrypt all the drives should I use LUKS or ZFS native encryption?
How should I configure the Hard Disks. The two 8TB I want to put in a zpool for all the files the LXCs share. I was planning on running the LXCs from either one or both (mirrored) SSDs. Should I mirror the SSDs or should I use part or all of one for an L2ARC cache? Since all the files are going to be accessed over a 1GB network connection I wasn’t sure if an L2ARC cache would be worth it. The server is on a 1500 APC UPS so I didn’t think a separate ZIL log partition was necessary.

My current plan is to have a USB Flash drive with the boot partition and the Keyfiles and LUKS headers for all the drives. The USB key will be locked in our safe most of the time. Yes I realize if it reboots I will have to manually intervene but this server is only for the four people in my family so it shouldn’t be a problem. I currently have a similar setup with a USB drive for my Arch Linux workstation.

Thanks

JJ

tkoham · July 20, 2018, 5:28am

zfs native/geli will probably give you less trouble down the line

more ram/a tuned ZIL is better than an l2arc

I’d recommend going with hitachi or toshiba over seagate given yer to year failure rates of the the latter

given that you won’t be writing to the containers a lot and SSDs fail readable, I’d drop one SSD and run a raidz with 3+ HDDs instead for better performance while retaining parity

Dexter_Kane · July 20, 2018, 6:13am

I used to run zfs on luks and the performance was garbage. It could have just been a tuning thing but I’d avoid it unless encryption is a higher priority than performance.

The native encryption should be more flexible as well.

jjquin · July 20, 2018, 6:26am

Thanks for your response @tkoham.

Do you know if I can get similar performance with ZFS native encryption vs LUKS?

I know Seagate’s drives aren’t as reliable as Toshiba’s or Hitachi’s, but I’m kind of stuck with the hardware I have atm.

What do you mean SSDs fail readable?

I’m confused about your Raidz suggestion. Are you recommending that I combine a 250GB SSD with the two 8TB HDDs? I’m new to ZFS (been using btrfs for awhile now), but from what I’ve learned so far that would give me a vdev running at the slowest drive’s speed and at the smallest drive’s capacity. I’ve had a lot of experience with RAID5 over the last 30+ years and always prefer mirrors or RAID10.

Can you please make a recommendation based upon the hardware I have?

Thanks

JJ

jjquin · July 20, 2018, 6:32am

Thanks @Dexter_Kane. I’ll give the native encryption a try and test it out right after I install. Another bonus is I can test it and I won’t have to start from scratch if it’s too slow.

JJ

tkoham · July 20, 2018, 6:37am

as @Dexter_Kane mentioned, native will perform better. It also allows for other tunables to work in conjunction, like lz4

when an SSD reaches EOL, you can still read/copy it, you just can’t write to it (assuming there isn’t a firmware problem and trim/write leveling)

ah, misread the OP, thought you were buying the drives. If you already have the ssds and the HDDs, then ignore that. I was suggesting you not buy a second SSD and use that money to get a 3rd HDD and/or some ram instead, then making a raidz1 (raid 5-6) out of the 3+ HDDs.

Yeah, sorry about that. I’d say just mirror the seagates, overprovision the second SSD and use it either as a SLOG or a L2ARC depending on how much RAM you can dedicate to your ARC.

If you do plan on expanding ram to 32gb, and your containers are gonna run and average of 1gb mem each, I’d say go for a SLOG, if you want to keep 16gb for a long time, go for a L2ARC.

Also explore other performance options like lz4

Raidz1 isn’t Raid5 – it just has similar parity.

Because any zfs pool can be read by any OS with zfs support, and the pool is totally controller/hba agnostic, Raidz1 is a lot more resilient than hardware raid5, and due to the architecture of zfs, write hole problems and other controller problems are mitigated or eliminated entirely. Even if a controller fails at the same time as a drive, you can just pop a new drive in and resilver on different sata ports.

The hardware controllers that caused limitations and reliability problems in complex parity configs aren’t present here, and zfs handles its volumes much more intelligently

If performance is the concern, an appropriately sized ARC and a fast slog will completely invalidate any performance differences between striped mirrored vdevs and raidz1, and if you’re really paranoid, you could always set up raidz3

then if you really want to get insane, you can nest existing raids into pools to make configurations similar to Raid50 or Raid100

jjquin · July 20, 2018, 9:43am

Wow thank @tkoham! I still have a lot to learn about ZFS I plan on using LZ4 Compression, I’ve been using it on btrfs and it’s fast and compresses well.

Thanks for the advice. I’ve been doing some research the last few hours. What I was thinking is I could mirror the 1st partition on each of the 250GB SSDs. Then I could create a 8-16GB mirrored SLOG across both. Finally, I could create a 32-48GB L2ARC on each that would be concatenated. My only concern is the amount of memory needed for the ARC header for that size L2ARC. According to my research the ARC header is:

(L2ARC in bytes) / (record size in bytes) * 70 bytes = ARC Header in bytes

or roughly 1GB of memory for every 10GB of L2ARC. With 16GB of memory I was planning on limiting my ARC to 8GB. I could always use the L2ARC on one of the SSDs now and activate the 2nd one after I increased the memory to 32GB.

My other concern is the old Intel server only has SATA II. So my SSDs are going to be limited to a max of around 300MB/s. Of course I’ll still have the low latency benefit over the spinning HDDs.

Thanks for the help!

JJ

tkoham · July 20, 2018, 11:51am

I’d just skip the l2arc and run the two SSDs independently

there’s no reason to make the slog redundant, it’s a temporary buffer. same goes for the l2arc. the metadata will let you know if something goes wrong.

you’re making it more complex than it needs to be.

thank god you saw the light

jjquin · July 21, 2018, 2:44pm

@c7a55cd06258db6 and everyone who responded. My plan right now is to just mirror both 250GB SSDs for the root and VM/LXC zpool. I’ve been learning ansible over the last few days and realized that I will probably need the space for more than just the production LXCs. I will also have duplicate containers that I can quickly power up if one of my containers crash. Plus, I’ll want to do any major upgrades changes in a new container and destroy the original once the new one is stable.

I really appreciate all the advice I learned a lot. I’ll watch my spools closely and add SLOG and/or L2ARC if needed in the future.

Thanks again JJ

tkoham · July 21, 2018, 5:14pm

maybe look into checkpoints/snapshots/boot environments (not sure if linux has that last one yet) if you want to do low risk in-place upgrades.