Threadripper / Vega Reset Bug

Howdy. I’m a long time viewer, but never participated in the community.

I have GPU passthrough working in a virtual machine using libvirt and vfio-pci. I can boot, shutdown, and reboot the guest without any problems. However, if I shutdown the guest and wait less than 10 minutes without booting it again, the guest GPU’s fan will speed up to full blast, and I can’t boot the guest again. I realize that the vfio-pci driver doesn’t control fan speed, so could that have something to do with it?

Relevant Specs

  • Gentoo Linux Kernel 4.16.9
  • Threadripper 1900X (CPU)
  • X399 Gaming Pro Carbon (mobo)
  • Vega 56 (guest GPU)
  • Windows 8.1 (guest)

Here’s my relevant kernel arguments. I’m not sure what kvm_amd.avic is for, and I’m planning on testing without it. I heard kvm_amd.npt used to cause problems in older kernel versions, but it seems to work very well for me.

vfio-pci.ids=1002:687f,1002:aaf8 vfio-pci.disable_vga=1 vfio_iommu_type1.allow_unsafe_interrupts=1 kvm_amd.npt=1 kvm_amd.avic=1

Here’s a virsh dumpxml of my vm.

<domain type='kvm'>
  <name>win8.1-gamer_1</name>
  <uuid>c6b8fef9-4df6-4d7f-afd8-c674c2a8d786</uuid>
  <memory unit='KiB'>8192000</memory>
  <currentMemory unit='KiB'>4096000</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.11'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/edk2-ovmf/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win8.1-gamer_1_VARS.fd</nvram>
    <boot dev='hd'/>
    <bootmenu enable='no'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <topology sockets='1' cores='4' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/win8.1-gamer_1.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:0d:17:d3'/>
      <source network='default'/>
      <model type='rtl8139'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <gl enable='no' rendernode='/dev/dri/by-path/pci-0000:42:00.0-render'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='2' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0c' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </memballoon>
  </devices>
</domain>

In dmesg, I notice that pci device 40:03.1 is having an issue, but that’s neither my host or guest GPU. The log is filled with these lines repeating. It might be a separate issue I need to deal with, since this is a new Gentoo setup.

[ 5276.318047] pcieport 0000:40:03.1: AER: Corrected error received: id=0000
[ 5276.318052] pcieport 0000:40:03.1: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=4019(Receiver ID)
[ 5276.318053] pcieport 0000:40:03.1:   device [1022:1453] error status/mask=00000040/00006000
[ 5276.318054] pcieport 0000:40:03.1:    [ 6] Bad TLP               

I’ll leave the output of lspci -vvvs 40:03.1 in case that’s of interest though.

40:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1453 (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin ? routed to IRQ 49
	NUMA node: 1
	Bus: primary=40, secondary=42, subordinate=42, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff
	Memory behind bridge: cc000000-cd0fffff
	Prefetchable memory behind bridge: 00000000c0000000-00000000c9ffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0
			ExtTag+ RBE+
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s <512ns, L1 <64us
			ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
		LnkCtl:	ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
			Slot #0, PowerLimit 0.000W; Interlock- NoCompl+
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
		RootCap: CRSVisible+
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+
		DevCtl2: Completion Timeout: 260ms to 900ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee40004  Data: 4021
	Capabilities: [c0] Subsystem: Device 7b09:1462
	Capabilities: [c8] HyperTransport: MSI Mapping Enable+ Fixed+
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Capabilities: [150 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [270 v1] #19
	Capabilities: [2a0 v1] Access Control Services
		ACSCap:	SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
		ACSCtl:	SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
	Capabilities: [370 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
	Capabilities: [380 v1] #1d
	Kernel driver in use: pcieport

Any help would be great. Thanks.

My first post here also :slight_smile:

modinfo -p vfio_iommu_type1

allow_unsafe_interrupts:Enable VFIO IOMMU support for on platforms without interrupt remapping support. (bool)
disable_hugepages:Disable VFIO IOMMU support for IOMMU hugepages. (bool)

As your CPU has interrupt remapping support delete vfio_iommu_type1.allow_unsafe_interrupts=1 from kernel cmdline.

Module kvm_amd parameters ( modinfo -p kvm_amd) you don’t have to set any of these on kernel cmdline:

avic (default=1) -> advanced virtual interrupt controller, you want that enabled;

npt (default=1) -> nested page tables, you want that enabled;

nested (default=1) -> enable a virtual machine in a virtual machine; enables hyperv in virtual windows machine, most people want that enabled for windows backward compatibility.

In general: don’t set any kernel module parameters as they are for quirks and specific situations. When using a Vega GPU compile one of the latest kernels (4.17.0-rc7) as the support is better in newer kernels.

With GPU passthrough it is advisable to give the virtual machine a copy of the GPU BIOS. If you still have a double boot setup for Windows, GPU-Z can extract the GPU BIOS. To enable the copied BIOS you have to edit the vm config:

virsh edit win8.1-gamer_1

<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
  </source>
  <rom bar='on' file='path to GPU BIOS copy'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</hostdev>

I’m not familiar with Gentoo as I use Debian, but the kernel and libvirt related configurations should be the same. If any questions remain come back and I try to help resolve the issues.

1 Like

Yeah, it seems that the reset bug in Vega and the NPT bug in Ryzen has been solved since kernel 4.16. The latest I can pull in from Gentoo’s package manager (portage) is 4.16.9, and it seems to work well. It just has issue with turning off the guest and leaving it off for over 10 minutes.

I thought that was only necessary for Nvidia cards, or when passing through the primary GPU. What does copying the GPU BIOS accomplish for Radeon?

I never had a dual boot on this machine. I installed Gentoo and the virtual Windows right away. I could try adding an extra HDD and installing Windows to it just to use GPU-Z, but that’s a lot of effort. Would it help with my fan speed issue?

1 Like

https://www.techpowerup.com/vgabios/ has a (the best) collection of GPU BIOS I know of. You can search for a BIOS that is very similar to your GPU and use that. Remember, Linux will (shadow) copy the BIOS and (possibly) modify it so best to have a “virgin” copy for virtual machines.

As far as I know vfio-pci configures devices into minimal power configuration when delegated to it. Your fan-speed issue is, in my opinion, related to the virtual machine interacting with the kernels view of the (shadow copied) GPU BIOS.

You should compile your own kernel with new hardware. At this moment in time the AMD ecosystem is changing so fast that distributions can’t keep up with it. You know how to do that?

1 Like

That’s a big help. I’ve already found one for my card. I wasn’t sure if it would be unique to my specific version of Windows 8.1, since the BIOS seems to be modified by the OS somehow. I need to study how that works better.

Gentoo is a source based distribution. Everything must be compiled from source. Configuring and compiling a kernel is part of the installation process. Some Gentoo packages have dependencies on the kernel, so I want to avoid installing one without the package manager.

I’m pretty sure I ticked all the right boxes when I used make menuconfig. I’d attach my .config, but it seems new users are not allowed to upload files.

Windows 8.1, hmm, that will break your back :smile: Just as an experiment install W10 in a virtual machine for testing and see where that brings you. There are options to switch Windows OS in VM’s.

https://www.microsoft.com/en-us/software-download/windows10ISO

It seems the latest installation ISO for Windows 10 blue-screens unless I set the virtual processor to be a core2duo. This appears to be a known issue, since I’ve found a few forum posts with the same problem: https://www.spinics.net/lists/kvm/msg128045.html. I’ve read that using a virtual architecture that’s different from the host’s real architecture can cause errors in some cases.

Regarding kernel compilation, can’t imagine (as Debian user) that “apps” -as a distribution is- would have dependencies on the kernel. However you would be well advised to compile your own kernel.

Wonders happen, sometimes, dont’t ask me why! Had two Gigabyte GA-AX370-Gaming K7 MB’s dying probably on VRM’s. Windows wouldn’t activate. With the exact same VM config on the new generation X470 AORUS GAMING 7 WIFI MB windows activated!

Yeah. It all just seems like magic at some point. I feel lucky I got it working this well so far.

That lucky feeling, wonders happening, security by obscurity is the real windows experience! If you have the source code the magic disappears :slight_smile:

Are you saying Windows is like a box of chocolates?

1 Like

Dunno. Ever had real Belgium pralines?

The chocolate encapsulates the deeper experience . It still amazes me to see all the devices Linux runs on.

Using this BIOS didn’t seem to make any difference at all.

I know it’s working, because of the following qemu command that libvirt executes.

LC_ALL=C PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/x86_64-pc-linux-gnu/gcc-bin/6.4.0:/usr/lib/llvm/5/bin:/opt/bin HOME=/root USER=root QEMU_AUDIO_DRV=spice /usr/bin/qemu-system-x86_64 -name guest=win8.1-gamer_1,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win8.1-gamer_1/master-key.aes -machine pc-i440fx-2.11,accel=kvm,usb=off,vmport=off,dump-guest-core=off -cpu EPYC,tsc-deadline=on,hypervisor=on,tsc_adjust=on,cmp_legacy=on,topoext=on,monitor=off,x2apic=off,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff -drive file=/usr/share/edk2-ovmf/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=/var/lib/libvirt/qemu/nvram/win8.1-gamer_1_VARS.fd,if=pflash,format=raw,unit=1 -m 8000 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid c6b8fef9-4df6-4d7f-afd8-c674c2a8d786 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-win8.1-gamer_1/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot menu=off,strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/var/lib/libvirt/images/win8.1-gamer_1.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:0d:17:d3,bus=pci.0,addr=0x2 -device usb-tablet,id=input2,bus=usb.0,port=1 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=2,bus=pci.0,addr=0xa -device intel-hda,id=sound0,bus=pci.0,addr=0xb -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device vfio-pci,host=0c:00.0,id=hostdev0,bus=pci.0,addr=0x4,rombar=1,romfile=/var/lib/libvirt/images/MSI.RXVega56.8176.171101.rom -device vfio-pci,host=0c:00.1,id=hostdev1,bus=pci.0,addr=0x7 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on

I’m only passing the BIOS to 0c:00.0. The HDMI audio is 0c:00.1, which is on the same device. Should I pass it to both?

The audio device doesn’t need the vBIOS file.

You should check that Interrupt remapping, vapic and AMD IOMMUv2 driver is enabled in the kernel .

Do a “dmesg | fgrep -i -e amd” :

0.255868] AMD-Vi: IOMMU performance counters supported

[ 0.258601] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 0.258603] AMD-Vi: Extended features (0xf77ef22294ada):
[ 0.258607] AMD-Vi: Interrupt remapping enabled
[ 0.258608] AMD-Vi: virtual APIC enabled
[ 0.258699] AMD-Vi: Lazy IO/TLB flushing enabled
[ 0.259597] amd_uncore: AMD NB counters detected
[ 0.259603] amd_uncore: AMD LLC counters detected
[ 0.259793] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 0.448873] AMD IOMMUv2 driver by Joerg Roedel [email protected]
[ 0.828228] QUIRK: Enable AMD PLL fix

That’s not in my dmesg. I assume it’s from enabling CONFIG_IRQ_REMAP in the kernel. When I was first compiling my kernel for KVM, I found that my computer wouldn’t boot. Through process of elimination, I found that it was CONFIG_IRQ_REMAP that was causing problems. I’m surprised Qemu works at all without it.

Could you post your kernel config? Here’s mine.
config.txt (113.7 KB)

For future reference, you can easily extract the video bios of your card in linux like this:

Check the pci ID of the card:

[me@tiny ~]$ lspci|grep -i vga
45:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c1)

Then find that card’s video bios file like this:

[me@tiny ~] find /sys/devices -name rom
/sys/devices/pci0000:00/0000:00:01.1/0000:01:00.1/rom
/sys/devices/pci0000:40/0000:40:03.1/0000:43:00.0/0000:44:00.0/0000:45:00.0/rom

The one you need has the pci ID we just found (45:00.0) in the pathname, it’s /sys/devices/pci0000:40/0000:40:03.1/0000:43:00.0/0000:44:00.0/0000:45:00.0/rom in this case.

Now do this:

[me@tiny ~] echo 1|sudo tee /sys/devices/pci0000:40/0000:40:03.1/0000:43:00.0/0000:44:00.0/0000:45:00.0/rom
[me@tiny ~] sudo cat /sys/devices/pci0000:40/0000:40:03.1/0000:43:00.0/0000:44:00.0/0000:45:00.0/rom > ~/vbios

You can use dd instead of cat, doesn’t make a difference.
And you’re pretty much done. The first command makes it so you can actually read the rom file, the second one writes it to a file called vbios in your home directory.

After this, you can do

[me@tiny ~] echo 0|sudo tee /sys/devices/pci0000:40/0000:40:03.1/0000:43:00.0/0000:44:00.0/0000:45:00.0/rom

to restore the read and write protection on the rom file.

It doesn’t make a difference if I copy the ROM using Windows or Linux? I was under the impression that it gets modified by the OS somehow.

No, it doesn’t matter at all if you do it on windows or linux, the ROM contains the BIOS that’s on your graphics card, the OS shouldn’t alter it in any way.

I just thought I’d mention a way to do it on linux, that way you don’t have to have a windows installation running to extract the bios, or trying to find it on a random website, just so you can run windows in a VM on linux with passthrough.

Edit: changed wording to make it less confusing

If you insist I will share my kernel configuration, nothing secret in there :slight_smile: It’s my opinion though that it will be confusing as I use a Ryzen 1700 CPU with the latest motherboard from Gigabyte and a RX580 GPU.

Then, there is, the Vega 56 GPU and the Threadripper 1900X (CPU) -rather expensive parts- that you employ in your system! Part of your problem just can be the combination of MB/CPU/GPU/driver interaction. Market penetration is abysmal for these parts -especially the GPU- due to coin-mining. Small user base -> less testing and bug reports -> early adopter problems.

A closer look at your OP shows:

Unknown Interrupt pin routed! I assume your BIOS is up to date? Can’t emphasize enough -with a “new” platform- to keep the UEFI BIOS up to date and just compile the most recent kernel. The kernel developers doing their best.

Specific kernel settings can f*ck up the data-structures in the UEFI NVRAM.

CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE:

Saying Y here will disable the use of efivars as a storage
backend for pstore by default. This setting can be overridden
using the efivars module's pstore_disable parameter._

The pstore (kernel permanent storage) happily can exhausts all the available NVRAM to the point that the UEFI BIOS can’t update it’s data. With my previous Gigabyte MB (dual bios) the only remedy was a fresh install of the BIOS.

“Persistence of Misery” is a fundamental property of the Universe, a fundamental Law you neglect at your own peril :slight_smile: