I work for a visual effects house We run Renderman in our renderfarm and use our Linux workstations for rendering after hours as well. In the near future we’re deploying game engines, Unreal and Unity, so we’re going to be required to have a cluster of Windows workstations. At the moment I’m considering Windows VM’s on Linux hosts using GPU passthrough. Have any of you had success with such a deployment in a production environment?
I mean while you could probably do that how valuable is having these servers available? How much down time can you tolerate?
How big is your IT team? Is the person responsible for this have the time to switch to fixing any issues this could create full time? I am not suggesting you would have issues, its just more that you have to plan for that and since there is not support contract you are on your own if something major comes up.
We need to run the servers in the farm 24/7. The Unreal/Unity stations are the only boxes I’m considering the passthrough schema for. I’m figuring they will eventually account for 20% - 25% of my workstation inventory by summer 2021. Our stock is pretty dated and we’re refreshing the renderfarm as well. I’ve been instructed to leverage every available proc hour for rendering but going mixed mode with Renderman in an environment like ours would be a nightmare so I want to keep my workstations as native Linux boxes where possible. We have a roadmap meeting scheduled Tuesday morning with Supermicro to lock down an action plan and timeline.
I would probably be a bit concerned putting it on boxes that need to have near 100% up time. If its just a VM on an end workstation that is used on the local box that can have some downtime if you need to reboot to fix some pass through issues then you might be able to get away with it. I would probably suggest running a small pilot program to see how it goes if that is a possibility. Have you done any GPU pass through before? Are all the machines that are going to do it running same OS with the same Hardware?
I’m new to GPU passthrough but I have until May/June to start deploying. There’s sufficient time for the learning curve. I’m just trying to come up with a solution that maximizes our resources. This could allow us to leverage a substantial chunk of our stock for off-hours revenue so there’s a significant fiscal impact.
The hardware will be standardized. We’re meeting with vendors over the remainder of the month and will have a standardized hardware profile.j
Sounds like it could work well especially when working with vendors for hardware. Start messing around with it, get your feet wet if you have some spare hardware to mess around with. I dont see any reason you can make it happen, there are plenty of people here that have a lot of experience with doing these types of things (probably not tons in a full blown production environment, but I would guess there are some still) If you run into any issues or need some reading material dig around on the forum. GL
Thank you. I’ll talk about test machines tomorrow during the Supermicro meeting to go along with the eval server. There’s to much value to be had on this not to at least vet the concept a bit.