Return to Level1Techs.com

Most appropriate storage topology

Hi All.

I have started putting together a TR-based machine that will be used as:

  1. A storage server
  2. A workstation
  3. A gaming rig

I will set up separate Windows 10 VMs with GPU passthrough to accomplish [2] and [3], but am unsure as to the best way to achieve [1]. At this stage I have ordered the following:

  1. A TR 2950X CPU
  2. Gigabyte x399 Aorus Xtreme motherboard
  3. 64GB of Crucial unbuffered DDR4 2666MT/s ECC memory

so I still have the opportunity to select components enable me to best accomplish [1]. My current PC uses the same SSD for the Debian host OS and a single Windows 10 VM guest, again with GPU passthrough. This time around I am not sure whether I should:

  • Configure two NVMe SSDs as RAID1 for use by both the host and VM guests, with a separate RAID5 HDD array for bulk storage
  • Configure two NVMe SSDs as RAID1 for exclusive use of the host, and pass through separate SSDs for use by the guests (again with a RAID5 HDD array for bulk storage)
  • Do something else entirely

I have prior experience with mdadm but none with ZFS, so am unsure if ZFS would be appropriate here. Can anyone make a recommendation as to which storage option I should investigate further?

1 Like

I would personally use a normal SATA SSD for the boot drive, and a pair of NVMe drives as VM storage.

I also keep data separate from O/S so I can just blow the host away and re-install, and just transfer /home or whatever elsewhere.
But that’s just me

I found a 4 deep mirror of HDD’s was not quite low latency enough for AAA gaming on a windows VM, so switched the game storage to a file on a small SSD array.
Made a big difference in COD:MW.
But my lord that game has some large updates…

How much bulk storage do you need?

RAID1 or RAID10 is usually simpler than RAID5/RAID50/RAID6/RAID60

ZFS gets you data checksumming and recovery and raid journaling of out of the box. Latter buys you quick rebuilds, former is there to give you a peace of mind when you have large volume of data and statistically get bit flips. With mdraid you only get journaling typically.

With ZFS you can’t remove/reconfigure storage as easily as with LVM (devicemapper) or with BTRFS.

  • With ZFS the only option is usually doubling the number of disks (or adding disks in batches if you have multiple vdevs already). Or swapping disks with bigger ones.

  • BTRFS has a roughly similar feature set to ZFS, but is more user friendly. With BTRFS you can move between raid5/raid10 literally in place, while mounted and while using the filesystem. It’s also lot more nicely integrated into Linux, just a kernel driver in the usual place with documentation as usual, and a progs package. To mount, pick any drive of the set (no zpool export/import/-f), or just make sure one of the drives that’s part of the set appears and mount by filesystem uuid + optionally subvolume id (if you’re keeping os separate from home, but on the same set) . It’s a lot friendlier/easier than mdadm/dmraid/zfs, which I appreciate when having to deal with dead disks.

The recommendation in all cases is to keep your bootloader and /boot separate from your storage pool, as all of these raid setups require some kind of manual intervention to boot degraded. With systemd these days, by default, you might not even be able to ssh into the host if you have a missing drive that’s part of your raid set.

Personally, between my btrfs / zfs / lvm setups spread across 4-6 drives each that I have had at home for the last 5-10 years, btrfs has served me best (running it in raid10 at the moment)

1 Like

I like this approach too. Using normal SATA SSD for boot drive.

I have not used NVMe drives for VM storage, VM images. But I have used SSD software defined partitions for swap spaces for whatever processes I have running on such VM’s, which has been very useful. OS write backs to the drives after boot are pretty low in the modern images for the most part, so you could use JBOD for VM Storage too.

This is just one more option to consider…

Another option you may want to consider.

  1. Use Unraid or Proxmox as the host from a cheap SSD or even usb pen drive; setup the NAS as a standalone container (in Unraid) or as a dedicated freenas VM (in proxmox) with ZFS for your bulk storage (passthrough a virtual GPU for initial install only then run headless).

  2. run both Debian and windows on dedicated VMs each passed through with a single fast disk

  3. setup your game library as a share from the NAS and share as needed, same with media etc.

Basically a hyperconverged system in a box. Could give you more flex to upgrade either os without worrying about the virtualisation, and you can snapshot or have multiple instances sharing the same hardware for different needs.

Just an idea.

Thank you for your replies. FYI, my bulk storage needs are quite modest - 16GB of redundant storage would more than cover my needs for the next few years. However, this is still more capacity than I can afford to host on SSDs (be they SATA or NVMe), so mechanical HDDs will continue to have a role to play.

With regards to Proxmox, I would be lying if I said I was overly familiar with it, but as far as I have been able to ascertain, it would not do anything that I could otherwise do myself - i.e. I do not need Proxmox to be able to set up VMs or containers on a base Debian install. Happy to be corrected of course, but to me it just seems like an unnecessary layer of abstraction.

I think the first thing I need to determine is whether or not I need to have the base OS (in this case Debian 10) running on ZFS (or BTRFS) as well as the bulk storage. As previously mentioned, I am currently running Debian on a RAID1 array using mdadm, but am unsure if this is sufficient. After all, its all well and good ensuring that your precious data files (in the bulk storage pool) are safe, but if your OS crashes because of an uncorrected error or similar, then you are in just as much trouble.

Peanuts. Get a pair of Seagate exos x 16T drives and be done. btrfs in raid1 will do. Then, in a couple of years, upgrade to a pair of 30T PCIe SSDs. Alternatively, get a pair of smaller drives today and then in a year when you’re at 80% full get a third and run btrfs fi balance.

I think I first need to gain a better understanding of the level of redundancy that each system component requires. Take the system description in the OP as case in point:

  1. The host OS will be a standard Debian Buster install. Because everything rests on this foundation, I think that this must have some level of protection. The current plan is to use two ~500GB SATA SSDs with power loss protection in a simple RAID1 (ext4) array. This can be done during the initial install. This is not foolproof by any means, and it would be better to have the host OS on ZFS, but having had a quick look at what is involved I am not sure my simple needs require such an involved solution - at least not yet.
  2. The two Windows VMs could then have their own dedicated ~500GB NVMe SSDs passed though, with all important files stored on a dedicated ZFS-based storage array (see below). I am still unsure if the VMs themselves should also be stored on the ZFS array, or if that is going overboard.
  3. The storage array would then likely be a simple ZFS RAID10 array of four ~4TB HDDs, possibly with a 32GB Optane NVMe to serve as a write cache.
  1. 4x4T in raid1/10 yields 8T(7T because HDD manufacturer rounding) of usable space.

For the rest:

How about: get a pair of 1TiB Samsung Evo Plus (or similar ssd whose latency doesn’t go to s**t as soon as you start writing) and partition them. First partition 256MB for uefi, second 32G for mirrored ZIL, remaining raid1 btrfs partition for host os as well as VM image files. (100G is plenty for Linux VM host, and you can always stretch into your bulk ZFS if you need to).