Introduction
TODO Ubuntu 24.04 current install as of 2024-06-29.
Pre-setup
I love my Falcon Northwest 96 core Threadripper system! It is based on the ASUS WRX90.
Here is a video of the bios settings I changed on the WRX90. Note this bios is the ancient 2023 version, which has a lot of bugs and suffering. Asus hasn’t updated the bios in all of 2024 at the time of this writing.
Step 1
Install Ubuntu 24.04 and make sure the installation is fully up to date. (One can use the gui or use apt update && apt upgrade -y
Reboot.
Step 2
Identify the card(s) that will passthrough to a windows VM using lspci. In the case of Quadro cards, such as the A6000 I am using for this how-to, there are sub functions and audio functions. It is necessary to identify the PCI IDs, then update the kernel command line, then ensure that the vfio modules are available on the initial ramdisk in order to bind at boot time.
Identifying cards to pass through
lspci -vnn
gives a listing of devices and their [ device : id ]
In my case I have an RTX A6000 and an RTX A4000 in the same system.
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102GL [RTX A6000] [10de:2230] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GA102GL [RTX A6000] [10de:1459]
c1:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
Subsystem: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1459]
e1:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104GL [RTX A4000] [10de:24b0] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation GA104GL [RTX A4000] [10de:14ad]
e1:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
Subsystem: NVIDIA Corporation GA104 High Definition Audio Controller [10de:14ad]
The numbers such as 10de:2228b and 10de:14ad are the PCIe device IDs.
I used these IDs to modify the file at /etc/default/grub
and update the GRUB_CMDLINE_LINUX_DEFAULT=
line to
GRUB_CMDLINE_LINUX_DEFAULT=" iommu=1 amd_iommu=on vfio_pci.ids=10de:2230,10de:1459,10de:1aef,10de:1459 vfio_iommu_type1.allow_unsafe_interrupts=1 "
Once that’s done, it is necessary to update grub with this commandline
# update-grub2
is the command to run.
Initial Ramdisk
The initial ramdisk has to have the vfio modules. Modify /etc/initramfs-tools/modules
to include the modules we need. On a fresh install this is what I came up with:
# List of modules that you want to include in your initramfs.
# They will be loaded at boot time in the order below.
#
# Syntax: module_name [args ...]
#
# You must run update-initramfs(8) to effect this change.
#
# Examples:
#
# raid1
# sd_mod
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
vhost-net
Once that’s done it is necessary to update the initial ramdisk.
root@wFNWtr:~# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.8.0-36-generic
… and then reboot.
Note: depending on your system it may be necessary to physically arrange the cards so that the card that is initalized first is the graphics card to be used by the host, and not the guest. Ideally, anyway. Sometimes it is possible to control this via bios, sometimes the bios is buggy, way out of date and needs a lot of love… so you just have to re-arrange your GPUs.
To verify everything went according to plan try
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.0-36-generic root=UUID=7782b3f2-2f20-49bc-805e-1fa250000101 ro iommu=1 amd_iommu=on vfio_pci.ids=10de:2230,10de:1459,10de:1aef,10de:1459 vfio_iommu_type1.allow_unsafe_interrupts=1
This shows that the changes made earlier have taken effect.
lspci -vvvvn |less
can be used to verify that driver-in-use is the vfio driver for the devices in question.
c1:00.1 0403: 10de:1aef (rev a1)
Subsystem: 10de:1459
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 49
IOMMU group: 19
Region 0: Memory at b3080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s (downgraded), Width x16
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq+ OBFF Via message, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- 10BitTagReq+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO+ CmpltAbrt- UnxCmplt+ RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [160 v1] Data Link Feature <?>
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
Configure Virt manager and setup windows VM
On my system I issued two commands:
apt install virt-manager
and
usermod -a -G username libvirt
the first installs virt manager. Once virtmanager and dependencies are installed, several new virtualization groups are added. I added my username to the libvirt group.
Relogin (or reboot) after installing these.
Install Windows
From here, create a virtual machine and install windows. I chose windows 11 pro to be setup from an ISO.
This step should normally be carefully considered, with best practices and max performance chosing a separate nvme to pass through to the VM. I skipped those steps for now.
DO NOT pass through the GPU, yet.
Pass through GPU
Next, pick the GPU and its audio device in “Add hardware” in the virt manager configuration gui (when the VM is not running. Mine looks like:
From there, the device should show up in device manager in windows. Install the normal windows drivers. They should work immediately, no code 43 warning presented.
I did not have to touch ``virsh edit``` or do any of the usual configuration shenanigans – everything Just Worked. I suspect it was because I used vfio to bind to the root and sub devices and because I passed through both the graphics and audio devices.
Todo more in a sec