VFIO/Passthrough in 2023 - Call to Arms

Neon8738 · October 19, 2023, 8:31pm

Another use case for SR-IOV.

The “correct” GPU to use for AI is nvidia. But, I already have an AMD GPU, and a pretty good one too. The ROCm SDK is presently missing from my distro (Debian), and it uses a kernel module. A Ubuntu VM with either half or all of my AMD GPU passed through would accelerate AI significantly over pure CPU work, and would hopefully fix a few other problems I’ve been having while I’m at it.

As I mentioned before, I also do gaming and windows-based CAD on this same system. So now I have two use cases, which means I have to balance the GPU between them. Being able to wholly allocate the GPU for one VM, then turn that VM off and boot up the other VM would be ideal. Or, to reach for the stars, some kind of dynamic allocation scheme, whereby whatever VMs try to use the GPU get an even share of what’s available.

Susanna · October 20, 2023, 7:16am

Oversubscription with virtual GPUs would be amazing. Just assign 100% of your GPU to each VM and have them dynamically adjust actual capacity after load.

1vivy · October 27, 2023, 5:26pm

On your W11 instance, is Hyperv hypervisor enabled?

bcdedit /set hypervisorlaunchtype off with admin privilieged CMD.

I run with x2avic forced, host passthrough, and all Hyperv enlightenments on.

With the hyperv hypervisor on, I get these in dmesg:
kvm_amd: kvm [2961]: vcpu0, guest rIP: 0xfffff8178329cc19 Unhandled WRMSR(0xc0010115) = 0x0

I’m not sure if it is the AVIC hardware, some CPU bit falsely passing through, or whatever screwing me over; I just haven’t had the time to further isolate what’s up. Otherwise, it’s very stable and I always try to use AVIC to lower DPC latency. I do notice the improvement.

Also I’m on Zen3 Ryzen 5950x. Just something to note because I remember reading up anything prior had some sorta hardware bug about AVIC and vmexits… I think…?

1vivy · October 27, 2023, 5:31pm

Please do share or create something! All Ik about AMD AVIC is through reading random but related patch series on the linux kernel and some r/VFIO threads.

I need to learn how to interact with the Linux Kernel Mailing List grr evil outdated email

louhy · October 28, 2023, 2:30am

@CodeDragon57 Not sure this fits your use case but there’s a software solution for sharing K&M (but not V) that does support Linux:

I used it a little years ago, it worked well and I think I only stopped using it because I dumped Windoze entirely. (Yes it works across different OSes too.) Well, also I didn’t want to keep running two machines all the time - power use and all.

CodeDragon57 · October 29, 2023, 1:41am

I tried that once, but I ended up requesting a refund because it didn’t seem to work for me. Perhaps I configured it wrong or something. At any rate, what I do now works. The beauty of VFIO is that I run them both on one machine. Once Handbrake finally supports Intel Arc cards in Linux, I won’t have to run Windows anymore. I’ll never understand why OSS projects like that treat Linux as a second-class citizen.

milux · October 29, 2023, 8:46pm

no success. same issue. Did you get this config running?

Janos · October 31, 2023, 6:58pm

I try it with every bios update, but the result is always the same, system hangs when booting shortly after grub

Janos · October 31, 2023, 7:02pm

Just asked out of curiosity, I don’t have a problem but my system always gives this error message when starting a VM, does anyone else have this on AM5?

[11885.106086] ------------[ cut here ]------------
[11885.106086] Trying to free already-free IRQ 44
[11885.106087] WARNING: CPU: 29 PID: 17092 at kernel/irq/manage.c:1893 free_irq+0x226/0x3b0
[11885.106090] Modules linked in: iscsi_tcp libiscsi_tcp bridge stp llc qrtr rpcrdma sunrpc rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd vfat snd_hda_codec_hdmi snd_usb_audio fat kvm snd_hda_intel joydev amdgpu snd_intel_dspcfg snd_usbmidi_lib crct10dif_pclmul snd_intel_sdw_acpi mousedev crc32_pclmul snd_ump amdxcp mlx5_core drm_buddy polyval_clmulni snd_hda_codec snd_rawmidi polyval_generic gpu_sched eeepc_wmi asus_nb_wmi snd_seq_device gf128mul snd_hda_core i2c_algo_bit asus_wmi ghash_clmulni_intel mc drm_suballoc_helper snd_hwdep ledtrig_audio drm_ttm_helper sha512_ssse3 i8042 sparse_keymap snd_pcm ttm aesni_intel serio platform_profile rfkill wmi_bmof usbhid drm_display_helper crypto_simd snd_timer r8169 cryptd mlxfw cec snd psample rapl ccp tls realtek video soundcore pcspkr k10temp sp5100_tco ucsi_acpi mdio_devres i2c_piix4 libphy typec_ucsi pci_hyperv_intf typec wmi gpio_amdpt roles gpio_generic
[11885.106111]  mac_hid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) dm_multipath tcp_dctcp crypto_user fuse dm_mod loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 nvme nvme_core crc32c_intel xhci_pci xhci_pci_renesas nvme_common vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
[11885.106119] CPU: 29 PID: 17092 Comm: rpc-libvirtd Tainted: P        W  OE      6.5.5-1-MANJARO #1 e9399f16590e7769efcdcd9f039e557ef90af6c1
[11885.106120] Hardware name: ASUS System Product Name/ProArt B650-CREATOR, BIOS 1710 10/05/2023
[11885.106121] RIP: 0010:free_irq+0x226/0x3b0
[11885.106123] Code: 95 02 00 49 8b 7f 30 e8 c8 54 1d 00 4c 89 ff 49 8b 5f 50 e8 bc 54 1d 00 eb 3b 8b 74 24 04 48 c7 c7 c8 dc e5 ac e8 2a f8 f5 ff <0f> 0b 48 89 ee 4c 89 ef e8 3d 5d c5 00 49 8b 86 80 00 00 00 48 8b
[11885.106124] RSP: 0018:ffffb76d8c1ebc58 EFLAGS: 00010086
[11885.106125] RAX: 0000000000000000 RBX: ffff9441c1fff828 RCX: 0000000000000027
[11885.106125] RDX: ffff94585ff616c8 RSI: 0000000000000001 RDI: ffff94585ff616c0
[11885.106126] RBP: 0000000000000246 R08: 0000000000000000 R09: ffffb76d8c1ebae8
[11885.106126] R10: 0000000000000003 R11: ffff94585f7fffe8 R12: ffff9441c8f6e1c0
[11885.106127] R13: ffff9441c8f6e0e4 R14: ffff9441c8f6e000 R15: ffff944e67acb9a0
[11885.106127] FS:  00007f27c37fe6c0(0000) GS:ffff94585ff40000(0000) knlGS:0000000000000000
[11885.106128] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11885.106129] CR2: 00007f27b805fa40 CR3: 0000000dcab68000 CR4: 0000000000750ee0
[11885.106130] PKRU: 55555554
[11885.106130] Call Trace:
[11885.106131]  <TASK>
[11885.106131]  ? free_irq+0x226/0x3b0
[11885.106133]  ? __warn+0x81/0x130
[11885.106134]  ? free_irq+0x226/0x3b0
[11885.106136]  ? report_bug+0x171/0x1a0
[11885.106137]  ? prb_read_valid+0x1b/0x30
[11885.106139]  ? handle_bug+0x3c/0x80
[11885.106140]  ? exc_invalid_op+0x17/0x70
[11885.106141]  ? asm_exc_invalid_op+0x1a/0x20
[11885.106143]  ? free_irq+0x226/0x3b0
[11885.106145]  ? free_irq+0x226/0x3b0
[11885.106147]  devm_free_irq+0x58/0x80
[11885.106148]  i2c_dw_pci_remove+0x59/0x70
[11885.106149]  pci_device_remove+0x37/0xa0
[11885.106151]  device_release_driver_internal+0x19f/0x200
[11885.106153]  unbind_store+0xa1/0xb0
[11885.106154]  kernfs_fop_write_iter+0x133/0x1d0
[11885.106155]  vfs_write+0x23b/0x420
[11885.106157]  ksys_write+0x6f/0xf0
[11885.106159]  do_syscall_64+0x5d/0x90
[11885.106161]  ? syscall_exit_to_user_mode+0x2b/0x40
[11885.106162]  ? do_syscall_64+0x6c/0x90
[11885.106164]  ? syscall_exit_to_user_mode+0x2b/0x40
[11885.106164]  ? do_syscall_64+0x6c/0x90
[11885.106166]  ? syscall_exit_to_user_mode+0x2b/0x40
[11885.106167]  ? do_syscall_64+0x6c/0x90
[11885.106168]  ? exc_page_fault+0x7f/0x180
[11885.106169]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[11885.106171] RIP: 0033:0x7f27cd3fd06f
[11885.106173] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 19 4d f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 6c 4d f8 ff 48
[11885.106174] RSP: 002b:00007f27c37fd460 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[11885.106175] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f27cd3fd06f
[11885.106175] RDX: 000000000000000c RSI: 00007f27b80725c0 RDI: 000000000000001d
[11885.106176] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000001
[11885.106176] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f27b80725c0
[11885.106177] R13: 000000000000001d R14: 0000000000000000 R15: 00007f27c07792d1
[11885.106178]  </TASK>
[11885.106178] ---[ end trace 0000000000000000 ]---

lI_Simo_Hayha_Il · October 31, 2023, 9:11pm

I can check if you tell me where to look

Janos · October 31, 2023, 9:32pm

dmesg --level=err,warn

YamiYukiSenpai · November 1, 2023, 5:20am

not sure what you mean.

You probably should just create a separate post.

I can’t remember much from the video but was that one of the things Wendell was working on solving?

lI_Simo_Hayha_Il · November 1, 2023, 7:36am

I have, and couldn’t get a solution, just a workaround.

lI_Simo_Hayha_Il · November 1, 2023, 8:48am

Nope, the only row written after I launch my VM is this:

[ 4530.216145] sched: RT throttling activated

jA_cOp · November 15, 2023, 6:03am

Just wanted to mention that the “static ReBAR” patch for qemu I talked about in this post was merged into master in commit b2896c1b on May 10th, and while I’m not sure which exact release it made it into, it’s definitely in as of v8.1.0. That means that anyone using a recent qemu who boots their 7900XTX-based VM while the graphics card has the increased BAR size should have SAM recognized and available in the Radeon software (it can also be confirmed in Device Manager).

mart · November 23, 2023, 5:17am

I have passthrough working on the Asrock B650m PG Riptide, Fedora/Nobara host, Windows guest. Setup was quick, I can post more details on what options were needed.

SgtAwesomesauce · December 5, 2023, 11:36pm

My 5700g/6900xt ITX build is still going strong. Finally enabled resizable bar, in November, and things went well.

Only complaint is I wish AM4 had more memory bandwidth. Looking Glass has been super reliable for me, and the greatest surprise to me was when audio support landed. <3

Marten · December 5, 2023, 11:43pm

As call to arms as much are a call to pass. We do need more pci lanes… Im jelly of threadripper.

seraphicalchaos · December 8, 2023, 5:37pm

I’m a bit late to the game (just watched the call to arms video yesterday). I’ve been doing VFIO in one manner or another at work for years now. I’ve decided to make Linux my main OS at home and run Windows on a VM for any games that warrant it given the anti-consumer / anti-privacy direction that Microsoft has been headings towards.

I’m mostly settled on:
AMD Ryzen 7 7800X3D 4.2 GHz 8-Core Processor
MSI MAG X670E TOMAHAWK WIFI ATX AM5 Motherboard
G.Skill Trident Z5 Neo 64 GB (2 x 32 GB) DDR5-6000 CL30 Memory
and reusing the rest of what I have in my old machine

I’ve been strongly looking at the 7950X3D because it’d be nice to have 8 of the lower performing cores for the host while allocating 8 of the higher performing cores to the VM. However, there is a lot of conflicting information about the host using core0 which sits on the 3D cache CCD. I feel like if I’m going to be stuck with 7 cores on the VM, then I might as well just roll with a 7800X3D and save the $200.

I don’t know if AMD is still talking to level1tech about VFIO, but this sort of thing has definitely muddied the waters on figuring out what part to purchase. The reports that I’ve read on the IOMMU grouping (network and misc. I/O) for the x670 is also a bit aggravating.

lI_Simo_Hayha_Il · December 8, 2023, 6:30pm

Been using 7950X3D for months, with zero issues.
Although I have set my UEFI setting to “Prefer Frequency cores”, Linux uses all of them, no matter what (Kernel 6.6.3).
I was afraid of the Core0 issue, but with or without core isolation* I don’t have any issues with either OS while using both.
My “Individual Core Usage” widget, shows even 100% usage on all 3D cores, while playing, and host doesn’t seem to be bothered.

*I am using vfio-isolate with “move” parameter, but even without it, I didn’t see any difference.