I just received a new sapphire nitro+ 7800xt to replace my aging 1080ti and its been nothing but a headache and I’m looking for some help / suggestions.
The card seems to game fine on the host.
I can consistently crash vms in both kvm and in virtualbox by playing youtube videos of all things inside the guest. They seem stable if i don’t use any video. I am using just the kernel amdgpu driver.
I use alot of vms to separate internet activity and youtube lives in a pop_os guest on the pop_os host. This has never been an issue on my old 1080ti.
I have virtualbox memory at 256meg, it has plenty of specs 8gigs mem, 100gig space etc… and i have tried almost every combination of setting on and off i can think of. 3d on off etc… various chipsets etc… I am not passing through any gpus (obvious not with virtualbox), i just use the vms for light stuff. I have guest additions installed.
To rule out something with my old install i reinstalled pop on a new drive and setup virtualbox and kvm and the same thing occurs so i can rule out it being something specific to my old software config.
To rule out psu i tried running each of the 8 pins off two different rails on the psu. Same issue.
I have monitored temps and they all seem fine too.
My Specs
psu: 850watt evga psu
mb: gigabyte aorus 570 pro wifi with latest bios
ram: 32 gigs of 3200 ddr 4 - memtest86 passed all
cpu: ryzen 3900x
nvme: 1tb samsung
os: Pop_OS 22.04 LTS - non nvidia edition
virtual software: virtual box and kvm
Bios settings: SMT on, Resizable Bar On (tested on and off), above 4G decoding(tested on and off)
Kernel: 6.6.6-76060606.202312111
This is the error I’m seeing in kern.log in the guest right before it crashes.
[drm:vmw_msg_ioctl [vmwgfx]] ERROR Failed to open channel.
Anybody else have an idea on what this could be or seeing things like this? Should i just return the card?
Update I think you may be correct. So i revisisted the phoronix review of the 7800 and i noticed this:
For those currently running Ubuntu 23.04, enabling the RX 7700 XT / RX 7800 XT support means just needing to run a newer Linux kernel version and also updating the AMDGPU firmware files from linux-firmware.git. AMD recommends Linux 6.4 and newer while the latest Linux 6.3 point releases work out too. Switching to a newer Mesa version isn’t required compared to what Ubuntu 23.04 ships but it’s generally recommended to use Mesa Git or the latest upstream release for the best RadeonSI OpenGL and RADV Vulkan performance. For those on Fedora 38, simply installing all available system updates will get you going with the necessary Linux kernel and firmware requirements.
So on my spare drive i installed ubuntu 23.10, virtualbox etc… and now… well it works. I really don’t like ubuntu but seems like i just might have to wait for pop_os to come up with a new version.
Update spoke too soon.
[drm:vmw_msg_ioctl [vmwgfx]] ERROR Failed to open channel.
In ubuntu 23.10 it ran for about 5 hours before it happened.
considering the 7000 series is very new, you might look towards a bleeding-edge OS like arch Linux. Also see if the BIOS is up to date and check the BIOS settings.
It working on Ubuntu for some time really makes me feel that the MoBo is not liking it for some reason, rather than the kernel.
That is what I would check, at least. Maybe it is worth a try.
DISCLAIMER for my sanity: Updating the BIOS is very risky, do it only if you feel comfortable doing it.
What virtual GPU are you using? I’ve had issues on amd with certain virtual gpus not working properly and randomly crashing (or just not starting at all) on my 6900xt. I think it’s #justamdthings
Definitely not the video BIOS, did not see that bit there
Adopting the bleeding edge is always a pain. Hope the next update fixes it.
You could also check the mesa driver too. It is most commonly used in graphics rendering but for all I know the VM might use a pipeline in there to display stuff… Just an idea.
Without knowing the configuration for the VM, it’s hard to say what rendering methods it’s using. Different virtual GPU devices will use different methods. Some have a Mesa interface, some tie into the driver itself, some attempt to do gpu partitioning, but that only works on Intel GPUs.
The idea sprung out to me when I was looking up if the CPU has internal graphics, which it doesn’t so I thought about software rendering and afaik that is handled by the mesa stack with a wide range of compatibility. The idea was ,since the thing envoking the error is drm as in drm:vmw_msg_ioctl which is a header in mesa (mesa/drm), that it has to do with the mesa driver.
Been doing some digging. About what @SgtAwesomesauce was saying about the VM configuration. Double check your Graphics controller, allocated video memory and allocated resources in general seems to cause that issue with some people.
This reminds me… If you are serious about VMs instead of just having something to mess around in and try retro OS:es of yore (or virtual distrohopping), do consider learning Qemu. Why? Because you can script it just the way you want it and need it, and if it comes crashing down you will have a ton of debugging options at your disposal.
I think Chris Titus Tech demonstrates it best:
This is about as optional as learning vim is though
I am reading it as VBox over KVM, could be wrong though.
Depends… Do you want to take your Linux drivers license or are you happy with your crappy 20 mph scooter? Not judging, but the difference is about that big.
I’m not sure what dhcp means in this context. Its not a network issue, its for sure something either with the gpu or the kernel and driver config. The display port cables being used are new and good.