Help needed, Proxmox File Server Config options

Hello, I have found this thread by Peanut253 and This one

that I found very helpful.
I was looking for something like this tutorial for a long time.
But as usual I need a little help and an opinion about how to apply this knowledge to my specific setup.

given the specs below, what would be the best setup option for me to have a nice, easy to manage file server and Proxmox hypervisor on that.
my plan is to use ZFS raid-1 on SSDs for OS (yes I know it may be overkill but that was the original config and I think I will keep it that way.)
I plan to use the 2x1TB as an additional ZFS raid-1 pool for VM and local storage.
the rest I planned to use BTRFS and somehow expose them to the world but maybe ZFS is an option too. I do not understand ZFS 100% but your tutorial helps.
my only issue now is that I have a bunch of drives that are different and ZFS does not like mix/match pools. what would you propose I set this up like?

I have an oldish SuperMicro AMD server
Chassis: Supermicro SC846 24 Bay chassis
Motherboard: Supermicro H8DME-2 BIOS v3.5 (latest)
CPU: 2 AMD Opteron Hex Core 2431 @ 2.4Ghz for total of 12 cores
RAM: 49GB DDR2 PC-5300f @ 667mHz ECC
4x1Gb NICs.

I have 2x120Gb SSD for OS
2x1TB HDD
3 or 4x3TB HDD
3 or 4x2TB HDD

thanks Vl.

mind using my forum name? Peanut253 instead?

Edit:

One weird quirk about how ZFS works is that each device in a zdev must be the same size but each vdev can also be made up of other vdevs. So you can get “nested” ZFS pools like this (in GB):

(1024+1024) - 2048

So that means 2x1TB (1024 GB) drives in RAID0 that are mirrored with a 2TB drive (2048 GB).

Currently your drives are 1, 2 and 3 TB drives, so I am sure you can mix them in whatever configuration makes sense.

The big difference between Raid10 in ZFS (mirroed vdevs) and RaidZ is the speed vs capacity trade off. What is your intended use case for that ZFS pool? Are you content with 50% usable of total capacity or are you willing to trade capacity for speed? Do you want any 1 disk fail or 2? Note that redundancy = less capacity.

  • zpools made from fewer total disks are easier to upgrade/replace later on. It may make sense to have 2 independent zpools.
  • stripped/mirrored vdevs making up a zpool are better are easier to upgrade/replace later on
  • RaidZ dramatically increases in capacity if every disk available is added to the same pool.
  • RaidZ is basically only for capacity+redundancy (at the expense of speed). This is great for long-term archival that needs minimal random/daily I/O. I did a RaidZ-2 for mine using 8 disks + 1 for cache that does both VMstorage and static data storage.

There are a lot of different ways to set up the ZFS configs and it would be helpful to know a lot more about how you are going to use the pool before being able to give sound advice.

At the moment, I would say that unless you are using the pool for business/mission critical stuff, then the SSDs are probably better off as cache. Just buy 1 or 2 USB flash drives and stick the proxmox OS on that. Remember that the Proxmox OS will reserve the full capacity of whatever disk(s) it is installed onto and then only use like 2GB of it. I think mine is at ~800MB used of the 16GB flash drive I put it on.

Edit2: vdev vs zpool clarification

why not :-).
Fixed

You probably don’t have a whole lot to worry about with your 49GB of memory, but do be aware that Proxmox does not set any kind of memory limitation for ZFS. ZFS will happily eat up your memory for the ARC if it can. You may want to limit it to 8 or 12GB of memory by editing /etc/modprobe.d/zfs.conf.

https://pve.proxmox.com/wiki/ZFS_on_Linux#_limit_zfs_memory_usage

Also pay attention to that swappiness setting mentioned on the wiki. You can throw your system into a tailspin where ZFS and your VMs eat up a bunch of memory, and then the system tries to swap some things, but because swap is on ZFS, ZFS’s memory usage becomes more aggressive, so the system has to swap more aggressively, so on and so forth. I’ve seen it. It’s not pretty.

OK, my data is mostly static.
a lot of multimedia files , movies / videos / pictures.
that I want to serve to my HTPC and also have access to from other PCs in the household, as well as to couple of streaming sticks etc. maybe I will have a Plex server VM to stream the files.

I also have a lot of backup files from different PCs and plan to implement auto-backup as as well so image files. all of this are either large mostly static files or a lot of smaller static files.
the only dynamic content would be an occasional download via torrent or FTP.

planed VM are
a Plex server or Media server VM for streaming.
a downloader VM with Transmission or Deluge, or maybe rtorrent client
an FTP server like FileZilla, a SubNZB / coach potato /blue bird/ Sonic setup OR whatever the latest version of those. I do not torrent much or use sabNZB but want to have an option if needed.
I also want to use this setup to help with organizing my media files and get some metadata for some things I already have.

other than that maybe an owncloud VM or similar for remote access to the files .

again, my concern is that a few of my drives are not overly new so they might fail some time in the future but I like to run them into ground. so I want to have a setup that will be robust and allow me a graceful recover if a drive fails without data loss or downtime.
data loss is not a big issue as I do have a backup.

can I move the swap onto it’s own drive/pool ?

I mean I do have a bunch of disk laying around. in fact I might even have a few extra SSD if I go with USB install.
I have a couple of 16GB SanDisk Flash drives I can use for OS.
that frees up 2 120 SSD :slight_smile:

You can do whatever you want. It’s just a matter of whether Proxmox makes it easy for you. :slight_smile: I think in their installer they include a utility that will allow you to adjust the size of various partitions, including swap. It’s mainly something to be mindful of. As long as you give ZFS a hard limit on memory to use, set the swappiness aggressively low, and don’t try to over provision the memory in your server, you should be fine.

So capacity seems like it is of primary importance as well as minimizing down time. I/O doesn’t matter since Plex can reliably stream at 8Mbps (1080p) even with a measily 1MB of read speed.

That strongly implies Raid-Z instead of pure mirroring/striping. To combine multiple drives, Raid 0 (mirroring) can be used with a vdev.

The only issue now is the exact arrangement of drives!

(~20TB)
2x1TB HDD
3 or 4x3TB HDD
3 or 4x2TB HDD

So… This means the following drives are available:
1TB HDD
1TB HDD

3TB HDD
3TB HDD
3TB HDD
3TB HDD

2TB HDD
2TB HDD
2TB HDD

Assuming the following arrangement in a RaidZ-2 config:

3TB HDD
3TB HDD
3TB HDD
3TB HDD

2TB HDD + 1TB HDD
2TB HDD + 1TB HDD
2TB HDD + 1TB HDD

So, does ZFS support JBOD configurations for the 2+1 configs? IDK. Maybe @Levitance does or the zfs wiki.

Assuming software JBOD works, that is 7 vdevs at the top layer each with 3TB, assuming you are willing to buy 1 extra 1 TB HDD and dedicate 4x3TB HDDs to the pool. They are like $30 on ebay.

So that is 21TB of raw capacity - 2 vdevs (3TB each) for redundancy is roughly 21-6 =~ 13TB capacity for a pool made of ~20TB worth of disks (raw).

I have no idea what that syntax to actually create such a pool would look like. Some of those are probably advanced format drives too. x_x

Questions:

  1. Is there any documentation on JBOD configs for ZFS anywhere?
  2. Is buying a single 1TB drive an option?
  3. Can you confirm the exact drive availability for the main pool? (not SSDs)

The SSDs are better off as cache in my opinion, but… I prefer “sketchy” pools suited for home use only. For long-term reliable usage (businesses), dedicating at least 1 of those SSDs may be worthwhile, esp if you will not be able to do maintenance if something goes wrong.

I don’t know that ZFS does JBOD. I believe it’s closer to RAID 0 where the disks do need to resemble one another, or you live with the fact that all effectively of your disks are the size of your smallest disk. I can test this tonight.

Well holeecrap.

qemu-img create -f qcow2 ./0.qcow2 2G ; qemu-img create -f qcow2 ./1.qcow2 1G ; qemu-img create -f qcow2 ./2.qcow2 3G
qemu-nbd -c /dev/nbd0 ./0.qcow2 ; qemu-nbd -c /dev/nbd1 ./1.qcow2 ; qemu-nbd -c /dev/nbd2 ./2.qcow2
zpool create -o ashift=12 dummy /dev/nbd0 /dev/nbd1 /dev/nbd2
zfs list
dummy                240K  5.77G    96K  /dummy

Soo 2 plus 1 plus 3 iiiiissss…
TimCurry

It’s 6. It really is 6. So it looks like JBOD in ZFS is a thing. I was really expecting ZFS to trim that vdev down to be the size of the smallest disk. Edit: Er, rather than the smallest disk, I was expecting ZFS to make the pool the size of all 3 disks striped together as if they were 1G disks. Subtly, but importantly different.

1 Like

OK,
#1. I am not sure but can and will do the research.
#2. it is an option, but I really do not want to do that. my 1TB are oldish drives even though I believe all my drives are 7200rpm but they are of different MFG.
some Seagate, some WD, one or two Hitachi. so the idea is that if I buy new drive it will be a bigger drive, not smaller. but nothing is written in stone :slight_smile:

#3. it’s been awhile since I look over the drives I have, but
at least 3x3TB is for sure. I might have 4 of them but have to really look into the server to be sure.
at least 3x2TB but if all goes well I will have one more that I am currently using in my main PC for storage.
definitely 2x1TB.

again if I go with USB sticks for OS I will have 2x120GB SSD
and I have an extra 240GB SSD I got on sale few month back for other needs but never used it.

also, capacity is not an issue either, at the present time I barely have 4TB of data. it should be easy to add the capacity as I need it, but right now even if I do not use all the drives ,to make it easier to build a robust configuration , it’s fine. I say if I have an odd number of drives, I will use one as hot spare rather than capacity.

so maybe if I do things right, my Proxmox setup would look like

main ZFS pool (OS install) = rpool ( either current setup 2x120GB ssd in ZFS raid-1 ) OR (2x16GB USB falsh in ZFS raid-1)

data pools :
Tank1 - ZFS RAID-1 using 3TB disks 1 VDEV (2x3TB = ~ 3TB usable ) + hot spare
Tank2 - ZFS RAID-1 using 2TB disks 1 VDEV (2x2TB = ~ 2TB usable)+ hot spare if have extra
Tank3 - ZFS RAID-1 using 1TB disks 1 VDEV (2x1TB = ~ 1TB usable) no hot spare

this setup gives me a nice mirrored pools that I can expand later as needed.
I can update Tank3 by switching to bigger disks and expand.
i one disk fails in any pool I can recover.

anything wrong with this setup?
I do not mind bind mount multiple pools to VMs and if can make the FileServer container work I would only need to bind mount this to one or maybe 2 VMs the File server container and my download VM

If you are prioritizing expandability later, then that pool config makes sense. Several discrete smaller pools is more upgradable than one large pool.

Keep in mind that you are trading what amounts to 17-20 TB of raw capacity, from about 10-13 TB usable using one large pool with 2 disks of redundancy down to ~9-10 TB with 1 disk redundancy per pool.

Personally, I do not like the idea of expecting to do maintenance later so I would prioritize redundancy. Raid5/RaidZ-1 scares me. I would rather do 3-way mirror than any 3 disk Raid 5/RaidZ-1, but this is your call, especially if you are fine doing maintenance.

At this point, I would recommend two things:

1- “Hot spares” is a reserved term when talking about RAID configs. It means the array exists independently of the “hot spare” and there is an additional disk or series of disks that can become part of the array if a disk in the array fails. Hot spares do not provide capacity to the pool. They also do not provide additional redundancy while they are not actively part of the array.

I think you meant to use the term “redundancy” or “redundant disk.” These disks are actively part of the array and have data on them. They are not hot spares. These distinctions matter when reading up on the ZFS documentation.

Do not use hot spares. Hot spares accumulate wear while merely powered on. On + unused = worst of both worlds. Either do cold spares (off and unused to minimize wear) or just make them part of the array as redundancy is their intended purpose (on + used).

2- I am assuming you are still going to be using RaidZ instead of mirroring+striping since speed is not much of a concern. Then, because RaidZ capacity benefits enormously from additional disks, roll Tank3 (mirror) into Tank2. This will increase the capacity and redundancy of the Tank2. The 1TB disks in the vdev should be replaceable with a single 2TB disk later on should either fail. IDK what the syntax would be for that however.

Or, if you are going to replace the 1TB disks right away, why bother including them in the server at all? Find some other use for them. This is again, a maintenance issue. If you do not mind doing constant maintenance, upgrading the pool later, 3 distinct pools with decreased redundancy is fine, but I would rather never do maintenance.

sorry you kind of lost me here.
I understand your take on hot spares and you might be right. since Proxmox has good support for SMART and built in monitoring for disks I probably can ignore hot spare and use the disk for capacity.

but what do you mean “maintenance” how am I to set things up to reduce maintenance?
do you mean that I should plan for 3 disk mirrored vdev

like
Tank1 : RAID-1(mirror) vDEV 3x3TB
Tank2: RAID-1(mirror) vDEV 3x2TB
and if I want to use my 1Tb Tank3 as it is.

or what would be your suggestion for my setup ?
I want separate pool because I do not want to mix drive sizes within a pool.
eventually all smaller drives will be replaced.
also if I have even number of drives, does it make sense for 3 way mirror or should I just use 2 disk vdevs ?

PS>> I will adapt the setups based on the drives available

I just meant, when is the next time you plan to reconfigure the hardware? A month from now? A year from now? 5 years? How many drives should fail before you should have to be compelled to do maintenance?

I like setting things up once and then doing maintenance years later, or never, until the system needs to be rebuilt completely as opposed to planning to upgrade it every 3-9 months. I consider that high maintenance. Except for cleaning out dust of course.

If you are going to change that hardware configuration “soon,” 3 tanks makes sense since that essentially means you are expecting to replacing the smaller disks within a year.

The exact drive count (and even/odd) used to matter but does not matter with newer versions of ZFS and Raidz.

Whether it makes sense or not depends a lot of your comfort level with drive failures and expected maintenance cycle.

If the expected maintenance level is never, then go with at least Raidz2. If you are doing maintenance within a year, Raid5/6/z1/z2 should probably be used over pure mirroring. 2 disk mirroring per vdev is only if you really need long term reliability and also performance. It provides a max of 50% raw capacity, which is lousy, imo.

oh ,ok

well when I am done, the next time I need to reconfigure the hardware hopefully will only be when/if something fails. or when I need more space.
I do not have much time to play with it, hence my goal to get best stable and easily expandable setup I can.

I run OMV on this machine before. but I want a bit better virtualization options than OMV can provide, so I want to move to Proxmox.
last time I had 2! 3TB drives failed , unrecoverably on me at the same time. I did not have a good backup and almost lost all data on the server as it was all on this drives.
luckily for me, I am a pack-rat, so I had copies of most all of the data on my main PC and on disks so I only lost a few movies that I downloaded and that are not much of a loss for me.

the drives I bought from someone on one of the forums, about a year before.
they passed all tests and SMART was looking good until one day they just died. 2 doorstops overnight. I bought 2 3TB new drives in hope of recovering some data because I thought I lost it all. but as I figure out that I still have almost all the data from the server, I just dropped the bad disks.

so I am trying to build something that would help me prevent this kind of things from happening again. I do have a backup now on an external 5TB drive. but still.
in your view, what would be my best config?

I think the configs have already been mentioned. The 3-pool config you mentioned above would work, but only provide 1 drive of redundancy. There is also the -dump all the drives into a single pool- idea that would provide better overall capacity and survive 2 drive failures. Raidz3 would also be possible if you want that extra safety net in the larger pool.

It is just upgradability (3 pool config) vs reliability (one-big pool ) at this point. That is a choice you need to make.

ok, how would I set this up as a single pool, having that I have a mixture of drives?
wouldn’t I be loosing capacity since ZFS will drop down to the smallest drive in the pool?

or do I build out vDevs per disk capacity and add that to the pool.
as in :
Tank
vdev1 => raidZ2 using 3-4 x 3TB
vdev2 => raidZ2 using 3-4x2TB

ignore the 1TB or create a separate pool Tank1 with mirror 2x1TB just to use the drives.

my concern with single pool is that, based on everything I have read so far , if a drive fails I might loose the whole pool.

For JBOD, the capacity will be added but without additional redundancy. Only the non-JBOD configurations reduce the capacity. I am not sure on the syntax, that is more Levitance’s domain. You can scroll up to check for that and experiment with virtual disks to find the right syntax.

The point of RaidZ is to add additional redundancy and capacity at the same time for each top-layer vdev. So, you will can have as many failures as you build for within a single pool. Having a lot of disks in a pool potentially allow for as many failures as you have disks minus two, although I think the max per vdev is 3. A 5 vdev/disk pool can have up to 3 vdev (disk) failures in Raidz-3 and have availability unaffected. (4 disk failures is the same as pure mirroring in a 5 disk array.)

If you are using the 1TB HDDs, then use the Raidz-2 config I recommended above: 3TB + 3TB + 3TB + (2+1) + (2+1) + (2+1) in Raidz2 or 3 at the top-level. That gives 5-7 vdevs at the top level, so, with raidz-2, at least 2 of the 5-7 vdevs (drives) would have to fail to lose the pool. This is probably what I would do. Just agree to “upgrade” later by adding additional z-pools, instead of swaping drives.

If you don’t want to buy another 1 TB drive, but happen to have a 4th 3TB drive that is: 3TB + 3TB + 3TB + 3TB + (2+1) + (2+1) with a spare 2TB drive available for backups or something.

Without using the 1TB drives (no 2TB+1TB JBOD configs as above)… well 2x3 = 6. Each top level vdev needs to be 6GB in order to do raidz for a single pool. That means 2x3TB + 2x3TB + 3x2TB = 3 top level vdevs so the highest level RaidZ config is Raidz-1. This is probably the worst of all options. Low performance, reliability, fragile configuration, not upgradable. If more than one drive fails, the pool is lost, so I don’t think that’s a good idea and I would recommend into looking into the 2+1 configs and figuring out the syntax.

got it , so if I do everything correctly I should end up with something like this?
(this is from my test VM setup. tank raidz2-0 is substitute for 3TB array and raidz2-1 is for 2TB array. actual disks in VM are 200GB and 100GB and the OS disks in rpool are standard hyper-V 127GB )
or vice versa if I only have 3x3TB disks.
use of 1TB disks is a moot point as my plan is to maybe use them for cache or something unimportant or temporary. maybe I create an extra pool to use for my torrent temporary downloads holding space.

    root@pve:~# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sda2    ONLINE       0     0     0
            sdb2    ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME                                        STATE     READ WRITE CKSUM
        tank                                        ONLINE       0     0     0
          raidz2-0                                  ONLINE       0     0     0
            scsi-360022480e2b5d312f71798988153c486  ONLINE       0     0     0
            scsi-360022480039f2676908c6d4ab3163810  ONLINE       0     0     0
            scsi-36002248057f910c3be98c4348927a20d  ONLINE       0     0     0
            scsi-3600224800550e54e5fbf9d9ecad52124  ONLINE       0     0     0
          raidz2-1                                  ONLINE       0     0     0
            scsi-36002248080f894a15620e162f3da6e46  ONLINE       0     0     0
            scsi-360022480fdfc78932ddbb7c12e7ddbd0  ONLINE       0     0     0
            scsi-360022480a669c3071258504796a5973b  ONLINE       0     0     0

errors: No known data errors
root@pve:~#

I have no idea if that looks right or not. I get the theory, but I have not stared at ZFS configs long enough to know. As long as you can nail down your pool config, I am sure someone else can help you with the specific syntax.