Windows passthrough VM won't boot

So this has happened before. Back in November to be exact. After an update (I am pretty sure) my VM won’t boot. This is what happens:

It pegs the cores assigned to it and the display attached to the video card that is being passed through does not come on. The console window shows stuff.

I think the last time this happened we (with the help of @GrayBoltWolf) deduced that the OVMF firmware became corrupted somehow, and the only way to fix it was create a new VM with identical settings and devices passed through to it.

I’d like to figure out a fix for this, rather than create another VM. Anyone have experience with this?

So, first of all, you’re using i440 as the chipset. You should really use q35 instead. That alone is going to require a rebuild of the VM configuration. It’s also entirely possible that that is the problem. I’ve encountered issues with this in the past where i440 works fine for a few days/weeks/months, then shits the bed.

Correct me if I’m wrong, but doesn’t it default to i440fx? I remember reading/hearing that using q35 wasn’t recommended because it wasn’t as stable or something. Possibly that it was supposed to be used for Apple guests.

I mean, I can try it. Is there a way to change the VM to q35 easily, or is it better to just make a new one?

Edit: So far it seems to be working using the q35 chipset. I am going to try some games to see if the performance changed or anything. Thanks for the help.

2 Likes

I see that you’ve got it working, but I want to respond about Q35 vs i440fx.

For a normal VM, that’s the case, but it’s what we all use on Passthrough machines. Q35 works better with UEFI bios (Apple guests are included in this, so you may have read that) and also has better support for PCIe and things like that without an ISA bus adapter.

As far as changing it, you should rebuild the entire VM (there’s a reason libvirt doesn’t just let you change it). I forget exactly what changes, but something changes aside from simply the chipset.

Further reading:

http://wiki.qemu.org/Features/Q35

Well, I have been using the default I440FX since I started PCI-passthrough a year and a half ago. It’s been mostly working that whole time, apart from the (now) two times it has just plain stopped working.

It makes total sense, though, why it wouldn’t work well. The I440FX looks like a very old system, so it’s no wonder that it wouldn’t work good with what we are trying to do.

I haven’t really had time yet to play games on it since I made the new VM with Q35 as the chipset. I started Crysis up for a bit and it felt choppy with frequent FPS drops. I will have to see if that’s really the case, maybe try a couple other games.

But now Windows deactivated because of the hardware change. Going to have to figure out how to activate it again. Don’t want to burn another of my spare Windows 7 keys.

Windows 7? Disable your NIC, re-enter the product key, call the number and enter the numbers. That’s a surefire way to activate any valid key on W7/8.

That’s probably because you’re not using hugepages.

Can you list your system specs so I have a better idea of what I’m dealing with?

Windows 10.

I’m not using hugepages. I remember seeing that in the guide I used to set up the passthrough, but didn’t enable that part. Not sure why. I guess I don’t know exactly what it does.

Host system is an i7-6700k, Asus Z170-A, 32GB, 512GB SSD for host, VM image storage is on a nasty old WD Green 1TB. Video card that is passed through is a GTX 970. Host uses integrated graphics. VFIO, if that makes any difference. Host OS is Fedora 26.

I was just sort of planning out how to upgrade VM image storage. First, I am out of space, and second, I know for a fact that the WD Green drive I am using is on its last leg.

I’m not a pro with Windows licensing, so I can’t say for sure what the best next step is on this. Usually, I just ragequit and install Linux, but that eliminates the point of the VM.

Probably because when you’re not using the VM, the ram still isn’t available.

So, basically the same setup as I have. :smiley:

How much storage do you need and what’s your budget? I’d be happy to give you my advice.

The Windows 10 VM has a 128GB main image, and then I had to make a 250GB storage image because of games. I don’t even have that many games installed (maybe four or five), but both images are nearly full.

I have it boiled down to two options: one relatively fast 4TB hard drive (probably a WD Black), or two 2TB drives with a SSD as cache for a ZFS pool. I feel like it would be weird to have a giant 1TB file sitting on the drive, though. Maybe just get a 1TB or 2TB drive and pass it through to the VM.

Not sure if VM image storage would benefit from a ZFS pool with an SSD cache.

Here’s the thing. If you go capacity, you’re going to want to avoid SSD caches for VM drives. It just doesn’t work as well as I’d originally hoped.

https://www.bestbuy.com/site/wd-easystore-8tb-external-usb-3-0-hard-drive-black/5792401.p?skuId=5792401

Grab this if you’re in the US. Basically, it’s a SATA WD Red, so if you rip the enclosure apart, you’ve got a great internal drive, and WD has a history of accepting RMA and warranty stuff on drives that have been shucked.

Your thoughts on getting multiple drives and passing the entire thing to the VM is a good idea. Just be aware that you need to play with UDEV rules to get it to work if QEMU is not running as root.

So, I was playing Crysis for a few minutes last night, and it was actually feeling pretty good. FPS was between 80 and 110, didn’t notice any major drops or glitches. Then the VM bluescreened. It said somemthing about memory managment, and the component that failed was win32kbase.sys. So I’m not sure if that’s a symptom of switching “chipsets” and windows is freaking out, or what.

After that I tried to enable hugepages, but it’s not working right. The VM fails to boot when hugepages is enabled in the config file. I added hugepages=2048 to grub and regenerated it.

When I do cat /proc/meminfo | grep Huge I get

AnonHugePages: 0 kB
ShmemHugePages: 0 kB
HugePages_Total: 2048
HugePages_Free: 2048
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

So I think hugepages is enabled. Not sure.

And yeah, I was thinking about getting an external HD. I kind of like the idea of stripping one for the drive, and giving the VM the entire drive. My ideal solution would be to use my fileserver as VM storage, but the gigabit network would be a bottleneck. I was fantasizing about 10Gb ethernet the other night, but the only cheap 10Gb NIC right now are fiber-based.

Well, that WD that I linked is an absolute steal. There’s a Red in it, guaranteed. (don’t tell McCarthy)

Hugepages is enabled, but you don’t have enough. Keep in mind that you’ve got 4GB of hugepages and you have 8GB of ram allocated to the VM. You’ll need to set hugepages=4096 to get it working properly. Once that’s done, you’re golden.

I wonder if there’s something going on with the host memory. Run a memtest and have it go through an entire pass, also can you share your XMP settings and timings? I’ve never had this problem before.

Don’t buy 10G unless you need it. Trust me, using VM storage on the network is not ideal unless you’re doing some clustered model and need node failure tolerance.

You can also configure iSCSI and mount it on your VM (within windows). Not the best, but definitely one of your choices.

That hard drive is definitely a steal. I just bought one. Wish I had the money to buy four more for my backup server. That is exactly what I need for that.

I set the hugepages to 4096 and now the VM works. Sweet. Thanks for the help with that. We will see if it runs smooth.

Memory setup is a bit weird, actually. I have two 16GB Corsair 2666MHz kits. They don’t actually play nice together, even though they are identical SKUs. I had to set the speed to 2600MHz a year or so ago because of incredible instability in the host OS. Ever since then it has been running smoothly. Not sure why the VM would bluescreen like that, but the only change was the new VM system based on Q35 instead of the 440FX that windows image was installed on.

Edit: Nope. This time Crysis itself crashed. I then went to shut down Windows and got a bluescreen, this time it said “unexpected kernel mode trap” and what failed was “dxgkrnl.sys” So now I am running memtest, just to see if it is the memory. But I suspect Windows is having driver issues with the new chipset.

Edit2: Memtest has already found errors. And locked up.

1 Like

This is my current dilemma. I’ve got 6x2TB drives in my array right now and it’s hard to not pull the trigger on upgrading while 8TB is as cheap as it’s going to be for the foreseeable future.

Glad to hear it. I do hope this is the problem with your system.

This sounds like an issue.

RMA time.

Corsair RMA isn’t what I’d call easy, but make sure you take the time and do it. This sounds like you’ve got some major memory issues.

Any time matched memory units don’t play well together, but work fine on their own, you know there’s a problem. At that point, your only real option is to RMA them.

Well, the frustrating part is I have already RMAed memory for this system. I’m not super impressed with Corsair Vengeance LPX.

The two kits are different versions. The timings are the same. Not sure. The first kit (what I think is the original kit based off the lower version number) has no errors, and memtest has been going for over two hours. Going to try the second kit.

The second kit did fine with the memtest too. So I reinstalled all four modules, this time following the motherboard manual’s suggestion for channels. I put the older version modules in channel 1 (A2 and B2) and the newer version modules in channel 2 (A1 and B1). Memtest ran for a little over 12 hours (overnight) with no errors.

I just played Crysis for over an hour with no problems. Excellent performance, FPS stays around 100 pretty consistent.

So there we go. Memory channels and memory module versions are important. Hugepages seems to have made a bit of a difference in performance. As did using Q35 instead of I440FX. Now I need to figure out how to expand the VM image size.

I can’t even put into words how badly I wanted to figure out this new problem. I have been fighting with it for a few days now, and I haven’t made any progress.

So, I am hoping someone has seen this and knows what to do. Here’s what happened that got me to this point:

I installed my new 8TB hard drive. I had issues booting because this changed the mount points. So I edited my fstab so all the hard drives were mounted via UUID. I have two 1TB drives in addition to the new 8TB. The first 1TB originally held the VM images, and still does mainly as a backup. I copied all the VM images to the new 8TB drive using rsync -avzhP Took forever, but eventually all the files were on the new drive.

After this, when I try to start my Win10 VM, this happens:

At first it didn’t even get to this screen. It would bootloop only getting to the spinning circle.

So I thought it was a permissions problem. rsync changed the permissions during the copy. virt-manager and qemu run as root, which I don’t like. So I changed all the file permissions to match the old files. No joy. I tried to use the original files. Also no joy.

So I am at the point where the only other thing that seems to have changed is the drive mount points. Would this make a difference? Does anyone have an idea of what’s going on here? Suggestions?

I have no explanation for this.

Was there a filesystem change? I’m really grasping at straws here.

Have you tried booting a Linux ISO in the VM and seeing if all the partitions show up in the VM?

If so, you may need to remount the windows install ISO and try doing a boot repair.


I’m glad to hear that the memory issues have been resolved though.

Just tried booting the Win10 install ISO. Went to advanced options, chose startup repair. It said it was diagnosing, then came up and said it couldn’t repair the pc. This time I tried reset this PC, chose keep my files, and it immediately comes up with “the drive where windows is installed is locked. unlock the drive and try again.”

I vaguely remember having this problem once. Will do a bit of research.

Edit: Everything I try to run (chkdsk, diskpart, bootrec, whatever) all either give errors that the disk isn’t there or it isn’t accessible.

I haven’t got a clue what that’s about. Good luck with the research.