Hardware and OS for a Virtualized Workstation?

I edited this comment since the original was excessively long.

Hello everyone. I have been looking at different hardware and OS configures for days, and I do not know what I should get.

I want a Linux workstation, but I want to occasionally run a Windows workstation from the same PC. I really want Linux and Windows to run on top of ZFS for easy data and backup management. Having Linux and Windows able to run at the same time would be great, but it would not be required. Both systems would need GPU acceleration.

Stability is a significant goal. Thank you for any advice or comments.

These are just my opinions. Do as you wish.

So, personally, I use Fedora 26 with BtrFS and KVM/VMware. I do not have stability issues. Just my 2 cents. In fact, Fedora has GUI tools for BtrFS file systems, and can install itself onto a drive with that formatting directly.

Pro Tip: Windows has a BtrFS Driver - https://github.com/maharmstone/btrfs As far as I know, an equivalent for ZFS does not exist.

If I were you, I would use GPU Passthrough for Windows, have the OS be stored on the ZFS file system (assuming you do use FreeNAS), and just create snapshots. One feature, AFAIK, of iSCSI is the system treating the drive like it is local anyway.

You’ve kind of confused me tbh. You aren’t sure if you’d need NFS or iSCSI, but it sounds like you’d have ZFS (with FreeNAS) on the Linux Workstation with you. Why would you need a network protocol if the file system is local?

I mean, the odds of two drives of the same mirror dying at the same time are so miniscule that I wouldn’t personally worry about that. I would go for RAID 10 (striped then mirrored). In this configuration, you have less redundancy, but double the speed for writes and quadruple the speed for reads.

If you’re using WD Red NAS drives, they’ll be pretty resistant to dying. Having four copies just seems… overkill. Same for three copies. After two copies, if you want redundant backups, it’s better to just automate daily snapshots onto some other drive in another system (i.e. a NAS over a network or an external enclosure).

1 Like

I value the opinions of people with experience. Windows having a btrFS driver is very interesting. I will need to look at the different features of that FS. And, hearing that you have Fedora 26 stable is interesting. I always liked Fedora, which is why I tried so hard to get it working in the past. Maybe I had issues because Fedora just implemented Gnome 3 at that time.

And, I expect that this is on your X99 system with a Fiji GPU. In that case, a X299 system with a Vega GPU would probably be stable as well.

Yeah, that does sound like a good way to go if KVM is working well. Listening to you is making Fedora with a Windows VM on the X299 platform seem really good.

Regarding the network proticols, if I build my ZFS zpool with FreeNAS, it would not be seen as local. The VM would need to provision the space through network proticols over a virtual network. That is why my options for using FreeNAS involve a hypervisor.

And, if I setup ZFS through Linux directly (no FreeNAS), I could use network proticols to provide VMs direct access to files on zvolumes. Otherwise, everything in the VMs would just be hidden in massive image files with no easy way for me to access the files on the host or another VM. I would keep as few files as possible inside the VM image to keep the file small. I am not sure how iSCSi would be used vs a virtual disk running on a zvolume since I am not familiar with iSCSi.

That configuration was suggested for production by several people in enterprise for various reasons. But, I agree that it seems excessive, especially since I am not applying an enterprise load. Your idea sounds good. But, I like the idea of being able to disconnect my backup from the PC in case something catastrophic happens. So, instead of a RAID 10, I would probably use two RAID 1 arrays. That combined with regular snapshots seems really good.

Thank you for your “opinion”. You have been helpful. Unless someone says otherwise, I am leaning toward a X299 system, now. I still need to decide on a virtualization structure, though.

1 Like

BtrFS doesn’t require the intense hardware that ZFS does. This means it doesn’t utilize an ARC (RAM cache) though so performance isn’t as much the focus. It would benefit from ECC RAM though as it supports scrubbing for bit rot. It also supports snapshots, subvolumes, etc etc. The only feature I would say not to use is RAID 5 or 6 because it’s not done yet (it’s a young file system, but it is inherently supported by Fedora so it has to be some kind of stable).

Ah, no. That’s old. This is on my RyZen 1700 system with a Radeon Pro Duo at home and on an AMD Athlon X4 system with a GT 710 at work. Pretty wide range from new to old and strong to weak hardware.

Not necessarily. That’s adding an extra layer of “stuff to manage”. From my perspective, you would just mount the ZFS volume for the VM OS via iSCSI to your Linux host. Then pass that through to your VM using KVM. KVM doesn’t have to care or know the drive isn’t local. This should work, though I don’t have experience using iSCSI, I do have experience tricking programs into thinking a directory is local using symbolic links.

My understanding of iSCSI, even though I haven’t used it before, is that what makes it useful is that it essentially makes a network drive appear as a local drive and gives good performance for doing that type of thing.

So if done right, no program should be aware or care that it’s accessing a network drive to put files on. Even a VM.

Depending on how you plan to do that, I don’t see the difference between what you can do with two RAID 1s and what you can do with a RAID 10. If your plan is to just turn the disks off, you can do that with a RAID 10 (just put the RAID in a degraded state by powering down two of the drives, making it a RAID 0 with a temporarily offline copy). If your plan is to remove the disks from the system temporarily, that’d have the same effect.

If your concern is that you won’t have a running backup while two of the drives are out of your system, then I could see your issue with RAID 10. Because then you wouldn’t have one until you replace the two drives.

But if that’s the case, I would have two HDDs at the ready to swap in so you never lack the RAID 1 portion of the RAID 10.

And honestly, if you intend to remove disks for cold storage for some reason, I would have two spares anyway. You’d want them to keep the RAID at the ready in case something catastrophic did indeed happen (like two drives dying at once).

You’re getting hot swap bays anyway, so I’d expect that to be the case.

I guess my point is that in either situation (RAID 10 or two RAID 1s), the outcome seems like it’d be the same if you were fully prepared (having spare HDDs to swap in).

http://www.enterprisenetworkingplanet.com/linux_unix/article.php/3794816/Share-iSCSI-Volumes-With-Linux-Clients-via-ZFS.htm

1 Like

Oh, I remember reading about that a while ago. I will need to look closely at it again. When I decided to look at ZFS instead, RAM was half the price that it is now.

A Ryzen 1700 with a Radeon Pro Duo, nice. So, the Threadripper configuration with ECC RAM should work fine. I just have to choose the RAM carefully. I have found some threads of people getting certain ECC RAM sticks working with the board that I am looking at, the ASRock X399 Taichi. So, I just need to be careful with my RAM selection.

iSCSI is a network protocol. You could even run a computer without a storage device by installing the OS onto an iSCSI drive over the network. It is just a lot faster than NFS solutions like SMB. As I mentioned before, the down side is that ZFS just sees it as a single data block, not as a collection of files. And, the total size needs provisioned initially, which could waste space if you have a few. NFS would be more flexible and require less storage space while iSCSI would be faster. I kept mentioning both of them because I would use a combination of them based upon what I need.

True. This would make the backup automatic as the inserted drives are resilvered. But, I was not sure if using it like this could have potential issues. Would the resilvering be as quick and optimal as an incremental backup? I will need to look into this before I actually setup everything.

You are correct. I would, ideally, have an active redundancy, like the 1 in RAID 10. In addition, I would have an inactive backup that is offsite. My thought of unplugging the drives was for an easy “offsite” solution (actually called an onsite backup). Usually, people suggest an active redundancy with an onsite backup and an offsite backup for important data. But, that seems extreme. I wonder if that is what everyone suggests because it sounds good but is also what everyone disobeys because it is impractical.

I could probably relax the 4x redundancy. And, I realized that I forgot about having a spare drive around. I am mostly concerned about the first few months of using a new HDD. It seems like they either fail in a few months or last for many years. A RAID 10 without any backups would still be very resilient. Resilvering that with a spare or two probably would not be an issue. And, eventually, I could add another hotswap bay for onsite backups.

That is very interesting. iSCSI on ZFS is even closer to the metal than I thought. That would be much better than virtual disks, especially if I could install a VM onto one.

So, where I am at now is using Fedora 26 with a Windows VM through KVM on a Threadripper system with a Vega GPU and ECC RAM. I need to decide between btrFS and ZFS, so I will be comparing their features and implementation. I have already found the setting for restricting the RAM usage of ZFS. And, it seems that ZFS is quite difficult to install Linux onto since it needs manually built into the LInux kernel for license reasons. BtrFS seems good, but I am hesitant since it is still so immature.

This leaves me with needing a FreeNAS VM. But, the host of that VM would not have ZFS support. So my Linux workstation, to be installed on ZFS, would need to be a VM running on the same host. This leads me to the Hypervisor architecture that I described in my first post. If I want to use the Fedora + KVM approach, I would probably want to drop ZFS. I will need to see how easy Xen is to use.

EDIT: Using Xen would be a complete pain. But, installing Ubuntu (and, thus, Mint) on ZFS is far easier than installing Fedora on ZFS. Therefore, I will use the Linux Workstation approach that I outlined in my first post. I will use Threadripper hardware on an ASRock motherboard with ECC RAM and RX Vega 56 graphic cards. The OS will be Ubuntu or a derivative of it installed onto a ZFS volume. I will use KVM to run a VM of Windows. When I need the extra GPU power, I will detach a GPU from my Linux OS, reboot, add the GPU to the VM, and boot the VM. Virt-manager should be able to handle this easily enough. The Windows VM image would be on a ZFS volume. General storage will be provided to the VM using NFS, and I will provide performance storage to the VM for temporary SolidWorks Simulation files by creating an iSCSI in a ZFS pool on a dedicated NVMe drive.

1 Like

That could be confusing. NFS itself is a network protocol. I can see why NFS could be a general term for network protocols, but yeah.

I believe you mean reslivered.

So, if you’re undoing the RAID 1 part of the array, you are undoing the mirroring part. Not the slivering part. RAID 0 is slivering. RAID 1 is mirroring.

You could keep the system running while it was mirroring the array.

Be sure to label which is drive A and which is drive B if you do create cold storage copies. You wouldn’t want to go back later and not know which one is which if you have more than 2 of these cold storage drives. i.e. picking two A’s and thinking your array is gone when you just picked 2 of the same half.

You started this thread by saying you wanted ease of use to maintain. You don’t want to put FreeNAS in a VM.

Too many “gotchas”.

1 Like

It is. I saw someone in a guide use NFS as an umbrella term for things like SMB, and he differentiated that from iSCSI. I do not know if that is technically correct.

Nope. For ZFS, the term is resilver. It is when the data is reconstructed onto a replacement disk. It could be for slivering or for mirrors. The term is general for repairing or reconstructing the array.
EDIT: Actually, resilvering seems to be a general term for repairing a mirror of disks, based upon the antiquated term of reilvering an actual mirror to repair it. It is used in the RAID world and for ZFS. It seems that “rebuild” is the term for rebuilding a striped RAID array, but ZFS use “resilver” for all disk array repairs.

"When a device is replaced, a resilvering operation is initiated to move data from the good copies to the new device. This action is a form of disk scrubbing. Therefore, only one such action can occur at a given time in the pool. If a scrubbing operation is in progress, a resilvering operation suspends the current scrubbing and restarts it after the resilvering is completed."
https://docs.oracle.com/cd/E19253-01/819-5461/gbbya/index.html

You are correct. I am used to the ZFS terms. Thankfully, I think you understood.

Oh, yeah, just like the Batcave (if you have seen the Batman with Adam West). :wink:

Thanks for the tip. I though that I would be safe by passing through the storage controller, but I did not know about this from one of those links:

“However, PCI passthrough on most consumer and prosumer grade motherboards is unlikely to work reliably. VT-d for your storage controller is dangerous and risky to your pool. A few server manufacturers seem to have a handle on making this work correctly, but do NOT assume that your non-server-grade board will reliably support this (even if it appears to).”

1 Like

Technically, but it is also a bit confusing when there’s a specific form of Network Protocol called NFS.

https://wiki.archlinux.org/index.php/NFS

It’s the oldest one (1984) so it makes sense for it to be the “umbrella term” when it was the origin (essentially).

Oh that’s neat.

You’re welcome. That was from 2013/2014, but I believe it is still true. People do put FreeNAS in VM’s successfully, but the amount of work to make that safe is a lot more significant than just using FreeNAS on bare metal.

Actually @far-bound has a point here, you’ve thrown together some impossible ideas and want our feedback on a wall of text that makes little sense. There’s a video somewhere on the L1 channel on how to ask questions and get help and I think you’re mostly missing the mark here.

ESXi doesn’t have a GUI on the host machine so you won’t have a workstation.

I don’t know how you used Linux when you gained the insight of nonexistent stability but this is hardly an accurate or competent description. Besides if you opt for a platform like Threadripper you might be pushed towards a distro such as Fedora as they tend to be quicker to adopt new code which may be useful to resolve issues or improve functionality.

If I were you I’d go with #3 but be prepared to dual-boot and get a NAS as I’m not sure you’ll piece together a result you would be happy with. Also not sure about Vega at this point in time. For some compute tasks the Linux drivers fall behind Windows but you’d have to check the benchmarks for you particular use case.

The feasibility was part of my question. And, some ideas require more than a few sentences (or paragraphs) to communicate. I actually did get a lot of help from Vitalius.

I was mixing largely enterprise architectures with consumer hardware and a consumer environment. Thanks to one of Vitalius’ links, I see that it would have been horribly unstable.

I had an install of Fedora where an update broke the file explorer. After the update, any time I selected a file larger than 1GB, the computer would hard freeze. I didn’t even have to try opening the file. In another fresh install, the framerate of the desktop magically dropped to a slideshow when I had not changed anything. I tried at least 6 fixes online, none of which worked. After a few months and many reboots, the issue magically corrected itself. I did not even update the system in that time. I also experienced issues where a fresh install had an error in the fstab, which seemed like quite an accomplishment for an install wizard. I was not even trying to do anything fancy. Regularly, something would randomly break for no reason.

To be fair, that was the computer that had a failing RAM stick that was silently corrupting bits. My issues with Linux were at least a year before Windows started crashing and I found corrupted data. But, it is possible that the RAM was noticeably corrupting Linux long before Windows exhibited noticeable signs of corruption. I did experience explainable crashes in a certain Windows program around that time. But, there would be no way to know for sure. I tested the RAM with memtest overnight when I first built the machine.

I am aware that using Fedora 26 with btrFS would be a possible fallback.

This is a question by Vitalius about using Linux as a host for a Windows VM, where Linux would be largely transparent beneath Windows. This could easily be a Windows workstation, despite using a headless host. That is very similar to the hypervisor setups that I was asking about. I wish that I could have captured his post’s simplicity in my post, though.

I will try to simplify my original post. In the least, it could help more people understand what issue is being solved and be helped by it.

1 Like

I know I’m a bit late to the party here, but since I’ve got a lot of experience with your use case, I thought I’d give you my input.

First thing’s first: You spent a lot of text explaining things that weren’t necessary for us to give you a suggestion. Next time if you could either shorten your post or add a TL;DR, that would help us out a lot.

Now on to actual advice.

I strongly suggest against an enterprise hypervisor, leaning towards QEMU/KVM instead. I’d recommend using the TR4 platform since it’s going to have massive memory capacities and could easily support what you’re going for.

Vega is not your friend when it comes to virtualization because it suffers from the bus reset bug. The best thing you can get is either a Quadro or a RX 580.

My OS recommendation is going to be Fedora. Miles ahead of Linux Mint as far as stability and Virtualization support goes.

When it comes to your Linux workstation, just make the Host OS your workstation. That way you don’t have any overhead of the hypervisor.

I’d recommend against BTRFS, instead opting for a properly tuned ZFS system because of the zvol feature that you can attach to your VM’s directly, as opposed to putting a file in a CoW directory.

2 Likes

So, uh, since the need for Windows arises, you have a few options.

Windows OS with VMware Workstation or Hyper-V to manage secondary Linux workstation. I used this setup for years. I had a Windows 10 host with Debian, Ubuntu, and Fedora guests with 8GB of RAM and 4 threads, about 120GB of disk space each. Perfect setup for a computer science student, QA engineer, and junior developer!

Fedora OS with VirtualBox (or VMware Workstation) to manage secondary Windows workstation. VirtualBox has great driver support, so if you need a Windows VM with full screen support that will do you good!

Fedora OS with KVM – Running the VM through virt-manager isn’t the best of options, in my experience. However, KVM is tiny and does a good job, so you could always install a RDP tool like Remmina on Fedora and RDP into the Windows box.

Good luck!

1 Like

Let it go, I think all of AustinC’s problems are solved. You see, he was just looking for someone like him, and he found Vitalius. Kind of a round about way of doing things but, it warms my heart that the two found one another.

Also, mods, if you could messege the individual in question and provide reason as to why their post was deleted that would be… I am wasting my time aren’t I.

Yeah, I was rambling on while I was sorting out options in my head. I abridged the post to almost nothing. Normally I would not edit something so significantly, but it was really bad. And, I think the edit communicates the question well.

I will take your word for it. KVM certainly looks impressive, more efficient, and much easier to use, but I am curious. Are there other reasons why you do not suggest the enterprise hypervisors?

Wow, I missed that. That is a significant pain. It looks like the bug only affects the GPU being passed through. Perhaps I could pass through a Quadro or WX 7100 and only use the Vega GPUs in Fedora.
Based upon the suggestion by caprica, I have not been able to find benchmarks for Vega’s computing performance in Linux compared to in Windows for what I am doing. CUDA on nVidia seems to be 10% slower on Ubuntu than on Windows 10 for what I do.

So, you say that the extra effort of installing ZFS on Fedora would be worth it. I can certainly see your logic. Fedora always was my favorite Linux distro (when it was working), so persuading me to use it would not be hard. Perhaps my Fedora problems really were mostly caused by silent RAM errors. It really is a pain how many RAM issues I have had. I have had RAM go bad more than any other component, and I don’t buy the cheapest RAM.

Thank you for your input.

Thank you for your suggestions. Virtualization truly is remarkable.

Do you have a better way to suggest?

1 Like

Sure does. Thanks!

To be clear, I wasn’t trying to come off as a dick, just trying to help you more effectively communicate your questions and goals.

They’re not really designed for tinkering and if you’re going to want ZFS as a backing store, you’re going to have difficulty doing that on the same machine with enterprise solutions that aren’t from redhat. I also wouldn’t necessarily say that KVM is more efficient, all hypervisors are similar, KVM is just my favorite. Xen is definitely an honorable mention though.

I don’t have a solid answer on the WX7100, but a Quadro should work. The Vegas will work just fine on the host as long as your kernel is new enough.

As far as Vega compute goes, I can’t say for sure, check Phoronix?

CUDA is always going to be a bit slower on Linux. nVidia doesn’t like Linux.

Wendell likes the Vega on Linux though:

I haven’t personally installed ZFS on Fedora, but according to the ZFSonLinux wiki page, it looks very simple, as long as you don’t want ZFS on your Fedora root (which I could go either way on).

As far as the ram issues go, I can only recommend talking to people on the hardware category for that, especially if you go Threadripper/Ryzen. They’re a bit finicky with memory and you’ll want to make sure you get hardware that is known working with the AMD memory controller.

I completely understand, and that is the way that I saw it. Your constructive criticism was well placed and well taken.

I see. That makes sense.

I read on a pro-KVM site that it was closer to bare metal performance than Xen and other hypervisors (up to 99%). But, taking that with a grain of salt makes sense.

The WX 7100 is the workstation version of the RX 480. But, it is much more power efficient, has 16GB vRAM, and is a single-slot card. Unlike most workstation cards, it does not have ECC vRAM, but that makes it a decently-priced workstation card for the performance (if you do not need ECC vRAM, which I do not). It being a workstation card is very important for SolidWorks performance. Vega FE would be a better choice because it has twice the performance for less than %50 more money of the WX 7100.

I was looking at the bus reset bug, and I may be able to live with it. To work around it, I would need to reboot my computer every time I turned off the Windows VM. That is not too bad. It is still better than a dual boot setup. And, a Windows VM without GPU passthrough would probably be adequate for most things. I do not know if I want to deal with the possible problems, though.

I have checked Phoronix. Blender Cycles rendering is what the Vega cards will be for, and finding reliable, comprehensive tests for it are difficult.

Yeah, and Wendell kept mentioning the issues and tinkering. But, he was talking about a fully open stack. He said that the proprietary AMD drivers were good. If “good” means that it does not need all the tinkering, Vega should work with the not-open drivers. AMD is supporting the upcoming WX 9100 (Vega for workstations) on Linux, so their drivers need to be fully polished soon. This is good to see since I have noticed that nVidia seems to cold toward Linux.

I see. While I really want everything on ZFS, I could compromise with the partition for ROOT. I will have to see how much trouble I have with it.

Thank you for the tip. Yeah, I have found forums where people are testing the ECC support on Threadripper motherboards, and they tell with RAM they use.

1 Like

I think you underestimate how annoying it is. I thought the same thing and bought a Fury and now I hate that damn card. That said, it’s your call.

Yep. They hate it. That’s one of many reasons I don’t like to support them.

https://www.csparks.com/BootFedoraZFS/index.html

This guy talks about building a ZFS root install of Fedora. Haven’t followed it myself, but if you’re interested, here it is.

1 Like

You are probably right. And, I may not want the WX 7100 for SolidWorks. AMD did not get driver certification of the FirePro W7100 for SolidWorks 2016. But, the Radeon Pro WX 7100 did not get certification for SolidWorks until the end of 2016. And, it seems like AMD’s Pro drivers are a mess on Windows. Moreover, I am a bit concerned about the stability of Vega on Linux. Although the RX 580 is much slower and less efficient, several RX 580s seem like the most stable choice. (EDIT: But, I have heard that KVM has issues passing through a GPU that is the same as another GPU in the system.)

Yeah, mine as well. I really like what AMD has been doing for not just Linux but also for open source in general.

Thanks. I will bookmark that for when I need it.

1 Like