AM5 SR-IOV issues with Mellanox - CPU 2/KVM conflicting memory types

Hi,

I found posts in other forums with the same issue for CX4 and CX6, it seems to affect everything above the CX3 on AM5.
I tried “pci=nocrs” and disabled “Above 4G Decoding” but it did not help.
I have zero hope that AMD is interested in SR -IOV problems with enterprise hardware on AM5, no idea how to solve this, buy an Intel 810 and use the CX5 for the server side?

Host System
AM5 - ASUS B650 Creator Bios 1602 - AGESA1.0.0.7c
Mellanox ConnectX-5 EX MCX516A-CDAT - firmware version: 16.35.2000

Host OS: Kubuntu 23.04
Kernel 6.2.0-32-generic
QEMU emulator version 7.2.0 (Debian 1:7.2+dfsg-5ubuntu2.2)
MLNX_OFED_LINUX-23.07-0.5.1.2-ubuntu23.04-x86_64

Testet Guest Systems
Kubuntu 23.04 6.2.0-32-generic
MLNX_OFED_LINUX-23.07-0.5.1.2-ubuntu23.04-x86_64

dmesg kubuntu guest

[ 21.188581] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 100s (0xffffffff)
[ 41.192590] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 79s (0xffffffff)
[ 61.196576] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 59s (0xffffffff)
[ 81.200595] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 39s (0xffffffff)
[ 101.204567] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 19s (0xffffffff)
[ 121.196566] mlx5_core 0000:07:00.0: mlx5_function_setup:1112:(pid 124): Firmware over 120000 MS in pre-initializing state, aborting
[ 121.196574] fbcon: Taking over console
[ 121.196579] mlx5_core 0000:07:00.0: probe_one:1741:(pid 124): mlx5_init_one failed with error code -16
[ 121.204608] mlx5_core: probe of 0000:07:00.0 failed with error -16

Windows 11 22H2
MLNX_WinOF2-3_10_52010_All_x64
Error - this device cannot start. code 10

SR-IOV is set in Mainboard UEFI and CX5 firmware, everything is fine up to this point, four VF where created and dmesg does not show any errors

enp5s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 1c:34:da:71:b8:3a brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 1 link/ether 00:22:33:44:55:67 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 2 link/ether 00:22:33:44:55:68 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 3 link/ether 00:22:33:44:55:69 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off

dmesg host system before starting the VM

[ 2851.961100] mlx5_core 0000:05:00.0: E-Switch: Enable: mode(LEGACY), nvfs(4), necvfs(0), active vports(5)
[ 2852.068907] pci 0000:05:00.2: [15b3:101a] type 00 class 0x020000
[ 2852.068970] pci 0000:05:00.2: enabling Extended Tags
[ 2852.069733] pci 0000:05:00.2: Adding to iommu group 37
[ 2852.069917] mlx5_core 0000:05:00.2: enabling device (0000 -> 0002)
[ 2852.069990] mlx5_core 0000:05:00.2: firmware version: 16.35.2000
[ 2852.216760] mlx5_core 0000:05:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2852.230329] mlx5_core 0000:05:00.2: Assigned random MAC address 9e:12:f7:cd:1e:c0
[ 2852.384148] mlx5_core 0000:05:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2852.386033] mlx5_core 0000:05:00.2 enp5s0f2v0: renamed from eth0
[ 2852.444191] pci 0000:05:00.3: [15b3:101a] type 00 class 0x020000
[ 2852.444251] pci 0000:05:00.3: enabling Extended Tags
[ 2852.444995] pci 0000:05:00.3: Adding to iommu group 38
[ 2852.445094] mlx5_core 0000:05:00.3: enabling device (0000 -> 0002)
[ 2852.445154] mlx5_core 0000:05:00.3: firmware version: 16.35.2000
[ 2852.587542] mlx5_core 0000:05:00.2 enp5s0f2v0: Link up
[ 2852.589409] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f2v0: link becomes ready
[ 2852.601351] mlx5_core 0000:05:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2852.626958] mlx5_core 0000:05:00.3: Assigned random MAC address ca:47:ce:5d:f7:ce
[ 2852.782367] mlx5_core 0000:05:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2852.783905] mlx5_core 0000:05:00.3 enp5s0f3v1: renamed from eth0
[ 2852.841278] pci 0000:05:00.4: [15b3:101a] type 00 class 0x020000
[ 2852.841338] pci 0000:05:00.4: enabling Extended Tags
[ 2852.842083] pci 0000:05:00.4: Adding to iommu group 39
[ 2852.842201] mlx5_core 0000:05:00.4: enabling device (0000 -> 0002)
[ 2852.842264] mlx5_core 0000:05:00.4: firmware version: 16.35.2000
[ 2852.985360] mlx5_core 0000:05:00.3 enp5s0f3v1: Link up
[ 2852.999307] mlx5_core 0000:05:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2853.105314] mlx5_core 0000:05:00.4: Assigned random MAC address 16:8b:bc:81:48:03
[ 2853.257975] mlx5_core 0000:05:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2853.260030] mlx5_core 0000:05:00.4 enp5s0f4v2: renamed from eth0
[ 2853.319024] pci 0000:05:00.5: [15b3:101a] type 00 class 0x020000
[ 2853.319085] pci 0000:05:00.5: enabling Extended Tags
[ 2853.319839] pci 0000:05:00.5: Adding to iommu group 40
[ 2853.319960] mlx5_core 0000:05:00.5: enabling device (0000 -> 0002)
[ 2853.320022] mlx5_core 0000:05:00.5: firmware version: 16.35.2000
[ 2853.476959] mlx5_core 0000:05:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2853.479937] mlx5_core 0000:05:00.4 enp5s0f4v2: Link up
[ 2853.501051] mlx5_core 0000:05:00.5: Assigned random MAC address ea:63:7d:c2:1f:de
[ 2853.613981] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f3v1: link becomes ready
[ 2853.614190] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f4v2: link becomes ready
[ 2853.663106] mlx5_core 0000:05:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2853.664776] mlx5_core 0000:05:00.5 enp5s0f5v3: renamed from eth0
[ 2853.866614] mlx5_core 0000:05:00.5 enp5s0f5v3: Link up
[ 2854.641583] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f5v3: link becomes ready

But as soon as the VM is started with a VF, this errors appear in the kernel log

[ 2987.068846] ioremap memtype_reserve failed -16
[ 2987.080846] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.080847] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.080848] ioremap memtype_reserve failed -16
[ 2987.092830] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.092831] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.092832] ioremap memtype_reserve failed -16
[ 2987.104843] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.104845] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.104845] ioremap memtype_reserve failed -16
[ 2987.116844] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.116846] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.116847] ioremap memtype_reserve failed -16
[ 2987.128840] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.128841] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.128842] ioremap memtype_reserve failed -16
[ 2987.140842] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.140844] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.140844] ioremap memtype_reserve failed -16

I posted the issue two times at the AMD community Server Guru forum and both posts got deleted without explanation.
I contacted the moderator to ask why, but got no response.
I tried it a third time with another account in an different sub forum, same thing deletet without explanation.
It was pretty much the same post like here, no clue whats wrong with it…

Not even an answer to my question about what’s wrong with the post, that’s pretty much the worst experience with a product I’ve ever had

I have a suggestion, maybe if you went into more detail how your system is behaving, someone would have an idea what is wrong.

I should try this onban am5 ipmi board I have

I’m a stupid idiot, it was a layer 8 problem at least partially.
But I have to say I went in the wrong direction because the board was so confused by the CX5 at the beginning that I had to do a CMOS reset because it stopped posting and got stuck with a memory error, even after I removed the card from the system.
The problem was solved with a firmware update of the CX5 in another system.
I repeated this three times, so there was no coincidence

It still doesn’t work in Kubuntu with MLNX_OFED_LINUX-23.07-0.5.1.2, but I have it running with Manjaro Kernel 6.4 and the Inbox driver.
In Manjaro hat to create this udev rule.

ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x15b3", ATTRS{device}=="0x1019", ATTR{device/sriov_drivers_autoprobe}="0", ATTR{device/sriov_numvfs}="4"

Everything seems to be working now, but I still get this error message, but so far I can’t see any symptoms

[  787.827489]  </TASK>
[  787.827490] ---[ end trace 0000000000000000 ]---
[  787.827491] ------------[ cut here ]------------
[  787.827491] Trying to free already-free IRQ 44
[  787.827493] WARNING: CPU: 29 PID: 4322 at kernel/irq/manage.c:1893 free_irq+0x226/0x3b0
[  787.827496] Modules linked in: iscsi_tcp libiscsi_tcp bridge stp llc qrtr rpcrdma sunrpc rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core vfat intel_rapl_msr fat intel_rapl_common edac_mce_amd kvm_amd snd_usb_audio snd_hda_codec_hdmi asus_nb_wmi eeepc_wmi snd_usbmidi_lib asus_wmi snd_hda_intel snd_rawmidi ledtrig_audio snd_intel_dspcfg snd_seq_device snd_intel_sdw_acpi i8042 sparse_keymap mc mousedev joydev ccp amdgpu snd_hda_codec platform_profile serio kvm usbhid rfkill wmi_bmof crct10dif_pclmul mlx5_core crc32_pclmul snd_hda_core polyval_clmulni drm_buddy polyval_generic snd_hwdep gpu_sched gf128mul i2c_algo_bit snd_pcm r8169 ghash_clmulni_intel drm_suballoc_helper sha512_ssse3 aesni_intel realtek snd_timer drm_ttm_helper crypto_simd mdio_devres ttm sp5100_tco cryptd snd drm_display_helper mlxfw rapl acpi_cpufreq ucsi_ccg cec k10temp pcspkr i2c_piix4 psample soundcore ucsi_acpi libphy tls video typec_ucsi pci_hyperv_intf typec wmi roles gpio_amdpt
[  787.827519]  gpio_generic mac_hid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) dm_multipath fuse crypto_user dm_mod loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 nvme crc32c_intel nvme_core xhci_pci xhci_pci_renesas nvme_common vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
[  787.827529] CPU: 29 PID: 4322 Comm: rpc-libvirtd Tainted: P        W  OE      6.4.16-1-MANJARO #1 b75fe5796da2edc38c34cd1a3d5a0deee650c91e
[  787.827530] Hardware name: ASUS System Product Name/ProArt B650-CREATOR, BIOS 1602 08/15/2023
[  787.827531] RIP: 0010:free_irq+0x226/0x3b0
[  787.827533] Code: 8e 02 00 49 8b 7f 30 e8 f8 cf 1c 00 4c 89 ff 49 8b 5f 50 e8 ec cf 1c 00 eb 3b 8b 74 24 04 48 c7 c7 f8 ae c4 a8 e8 ba 11 f6 ff <0f> 0b 48 89 ee 4c 89 ef e8 5d 6a c4 00 49 8b 86 80 00 00 00 48 8b
[  787.827533] RSP: 0018:ffffad320140bc28 EFLAGS: 00010086
[  787.827534] RAX: 0000000000000000 RBX: ffff98cb8163d828 RCX: 0000000000000027
[  787.827535] RDX: ffff98e21a1616c8 RSI: 0000000000000001 RDI: ffff98e21a1616c0
[  787.827536] RBP: 0000000000000246 R08: 0000000000000000 R09: ffffad320140bab8
[  787.827536] R10: 0000000000000003 R11: ffff98e277fa46e8 R12: ffff98cb86223bc0
[  787.827537] R13: ffff98cb86223ae4 R14: ffff98cb86223a00 R15: ffff98d7d3709d60
[  787.827538] FS:  00007f7074dfe6c0(0000) GS:ffff98e21a140000(0000) knlGS:0000000000000000
[  787.827539] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  787.827539] CR2: 00007f706804baf8 CR3: 0000000df3ba4000 CR4: 0000000000750ee0
[  787.827540] PKRU: 55555554
[  787.827541] Call Trace:
[  787.827541]  <TASK>
[  787.827541]  ? free_irq+0x226/0x3b0
[  787.827543]  ? __warn+0x81/0x130
[  787.827545]  ? free_irq+0x226/0x3b0
[  787.827546]  ? report_bug+0x171/0x1a0
[  787.827548]  ? prb_read_valid+0x1b/0x30
[  787.827551]  ? handle_bug+0x3c/0x80
[  787.827552]  ? exc_invalid_op+0x17/0x70
[  787.827554]  ? asm_exc_invalid_op+0x1a/0x20
[  787.827556]  ? free_irq+0x226/0x3b0
[  787.827558]  ? free_irq+0x226/0x3b0
[  787.827560]  devm_free_irq+0x58/0x80
[  787.827561]  i2c_dw_pci_remove+0x59/0x70
[  787.827563]  pci_device_remove+0x37/0xa0
[  787.827565]  device_release_driver_internal+0x19f/0x200
[  787.827567]  unbind_store+0xa1/0xb0
[  787.827568]  kernfs_fop_write_iter+0x133/0x1d0
[  787.827570]  vfs_write+0x22b/0x3f0
[  787.827572]  ksys_write+0x6f/0xf0
[  787.827573]  do_syscall_64+0x5d/0x90
[  787.827575]  ? syscall_exit_to_user_mode+0x2b/0x40
[  787.827576]  ? do_syscall_64+0x6c/0x90
[  787.827577]  ? syscall_exit_to_user_mode+0x2b/0x40
[  787.827578]  ? do_syscall_64+0x6c/0x90
[  787.827579]  ? do_user_addr_fault+0x179/0x640
[  787.827582]  ? exc_page_fault+0x7f/0x180
[  787.827583]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  787.827585] RIP: 0033:0x7f707950469f
[  787.827588] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 46 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c 47 f8 ff 48
[  787.827588] RSP: 002b:00007f7074dfd460 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[  787.827589] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 00007f707950469f
[  787.827590] RDX: 000000000000000c RSI: 00007f7068034730 RDI: 000000000000001d
[  787.827591] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000001
[  787.827591] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f7068034730
[  787.827592] R13: 000000000000001d R14: 0000000000000000 R15: 00007f70743432d1
[  787.827593]  </TASK>
[  787.827593] ---[ end trace 0000000000000000 ]---

In Kubuntu I get this error when starting the VM and libvirtd dies
Autoprobe is disabled, so that can’t be the problem

[ 1217.170043] BUG: unable to handle page fault for address: 000000000002d0b0
[ 1217.170047] #PF: supervisor read access in kernel mode
[ 1217.170048] #PF: error_code(0x0000) - not-present page
[ 1217.170050] PGD 0 P4D 0
[ 1217.170052] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1217.170054] CPU: 6 PID: 2628 Comm: rpc-libvirtd Tainted: P OE 6.2.0-33-generic #33-Ubuntu
[ 1217.170057] Hardware name: ASUS System Product Name/ProArt B650-CREATOR, BIOS 1602 08/15/2023
[ 1217.170058] RIP: 0010:remove_one+0x32/0x140 [mlx5_core]
[ 1217.170105] Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 8b 9f 48 01 00 00 48 89 df e8 1c d1 bf dd 41 80 bc 24 43 08 00 00 00 49 89 c5 79 1c <44> 0f b6 b3 b0 d0 02 00 41 80 fe 01 0f 87 8c 34 13 00 41 83 e6 01
[ 1217.170106] RSP: 0018:ffffbb3bc1f8bca0 EFLAGS: 00010282
[ 1217.170108] RAX: fffffffffffffe20 RBX: 0000000000000000 RCX: 0000000000000000
[ 1217.170110] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1217.170110] RBP: ffffbb3bc1f8bcc8 R08: 0000000000000000 R09: 0000000000000000
[ 1217.170111] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9594b45a9000
[ 1217.170112] R13: fffffffffffffe20 R14: ffff9594b45a9150 R15: ffff959400bf2150
[ 1217.170113] FS: 00007f96acdfe6c0(0000) GS:ffff95aa99b80000(0000) knlGS:0000000000000000
[ 1217.170114] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1217.170116] CR2: 000000000002d0b0 CR3: 00000001214d4000 CR4: 0000000000750ee0
[ 1217.170117] PKRU: 55555554
[ 1217.170118] Call Trace:
[ 1217.170119] <TASK>
[ 1217.170121] ? show_regs+0x6d/0x80
[ 1217.170124] ? __die+0x24/0x80
[ 1217.170126] ? page_fault_oops+0x99/0x1b0
[ 1217.170130] ? do_user_addr_fault+0x2f3/0x620
[ 1217.170131] ? exc_page_fault+0x80/0x1b0
[ 1217.170134] ? asm_exc_page_fault+0x27/0x30
[ 1217.170138] ? remove_one+0x32/0x140 [mlx5_core]
[ 1217.170179] pci_device_remove+0x36/0xb0
[ 1217.170182] device_remove+0x40/0x80
[ 1217.170184] device_release_driver_internal+0x222/0x2a0
[ 1217.170187] device_driver_detach+0x14/0x20
[ 1217.170189] unbind_store+0x102/0x130
[ 1217.170190] drv_attr_store+0x21/0x50
[ 1217.170193] sysfs_kf_write+0x3b/0x60
[ 1217.170195] kernfs_fop_write_iter+0x130/0x210
[ 1217.170197] vfs_write+0x24e/0x410
[ 1217.170200] ksys_write+0x73/0x100
[ 1217.170202] __x64_sys_write+0x19/0x30
[ 1217.170203] do_syscall_64+0x58/0x90
[ 1217.170205] ? do_syscall_64+0x67/0x90
[ 1217.170207] ? syscall_exit_to_user_mode+0x37/0x60
[ 1217.170209] ? do_syscall_64+0x67/0x90
[ 1217.170211] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1217.170212] RIP: 0033:0x7f96b130ba1f
[ 1217.170214] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 69 f5 f7 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 bc f5 f7 ff 48
[ 1217.170215] RSP: 002b:00007f96acdfd340 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[ 1217.170217] RAX: ffffffffffffffda RBX: 000000000000001c RCX: 00007f96b130ba1f
[ 1217.170218] RDX: 000000000000000c RSI: 00007f969c063ee0 RDI: 000000000000001c
[ 1217.170219] RBP: 000000000000000c R08: 0000000000000000 R09: 00007f96acdfc670
[ 1217.170219] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f969c063ee0
[ 1217.170220] R13: 000000000000001c R14: 0000000000000000 R15: 00007f96b1b4f50b
[ 1217.170222] </TASK>
[ 1217.170223] Modules linked in: snd_seq_dummy snd_hrtimer xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c nfnetlink bridge stp
llc vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd cuse rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) snd_hda_codec_hdmi zfs(PO) snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_usb_audio snd_hda_codec snd_hda_core s
nd_usbmidi_lib mc snd_hwdep zunicode(PO) snd_pcm intel_rapl_msr zzstd(O) intel_rapl_common snd_seq_midi snd_seq_midi_event edac_mce_amd zlua(O) snd_rawmidi zavl(PO) nls_iso8859_1 kvm_amd snd_seq icp(PO) mlx5_ib(OE) snd_seq_device kvm zcommon(PO) snd_t
imer irqbypass ib_uverbs(OE) znvpair(PO) snd wmi_bmof asus_nb_wmi rapl k10temp joydev spl(O) input_leds ib_core(OE) ccp soundcore mac_hid binfmt_misc knem(OE) msr parport_pc ppdev lp parport efi_pstore dmi_sysfs ip_tables x_tables autofs4 amdgpu
[ 1217.170265] mlx5_core(OE) iommu_v2 drm_buddy gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_display_helper cec mfd_aaeon rc_core crct10dif_pclmul hid_generic asus_wmi drm_kms_helper crc32_pclmul ledtrig_audio polyval_clmulni mlxdevm(OE) syscopyarea
polyval_generic sparse_keymap sysfillrect ghash_clmulni_intel mlxfw(OE) usbhid hid sha512_ssse3 aesni_intel nvme psample sysimgblt crypto_simd ucsi_ccg platform_profile drm r8169 cryptd tls ahci nvme_core video xhci_pci i2c_designware_pci i2c_piix4 l
ibahci realtek mlx_compat(OE) xhci_pci_renesas i2c_ccgx_ucsi nvme_common ucsi_acpi pci_hyperv_intf typec_ucsi wmi typec gpio_amdpt
[ 1217.170291] CR2: 000000000002d0b0
[ 1217.170293] ---[ end trace 0000000000000000 ]---
[ 1217.809695] RIP: 0010:remove_one+0x32/0x140 [mlx5_core]
[ 1217.809741] Code: e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 8b 9f 48 01 00 00 48 89 df e8 1c d1 bf dd 41 80 bc 24 43 08 00 00 00 49 89 c5 79 1c <44> 0f b6 b3 b0 d0 02 00 41 80 fe 01 0f 87 8c 34 13 00 41 83 e6 01
[ 1217.809743] RSP: 0018:ffffbb3bc1f8bca0 EFLAGS: 00010282
[ 1217.809745] RAX: fffffffffffffe20 RBX: 0000000000000000 RCX: 0000000000000000
[ 1217.809746] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1217.809747] RBP: ffffbb3bc1f8bcc8 R08: 0000000000000000 R09: 0000000000000000
[ 1217.809748] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9594b45a9000
[ 1217.809749] R13: fffffffffffffe20 R14: ffff9594b45a9150 R15: ffff959400bf2150
[ 1217.809750] FS: 00007f96acdfe6c0(0000) GS:ffff95aa99b80000(0000) knlGS:0000000000000000
[ 1217.809751] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1217.809752] CR2: 000000000002d0b0 CR3: 00000001214d4000 CR4: 0000000000750ee0
[ 1217.809754] PKRU: 55555554
[ 1217.809755] note: rpc-libvirtd[2628] exited with irqs disabled

Performance seems OK, need to test this more closely, but looks normal with Manjaro even though the error occurs.
With Kubuntu I have to restart the system, it hangs when shutting down and I have to power it down hard

shouldn’t SR-IOV perform better than simply a Linux bridge?

That’s the CX5 with SR-IOV
CrystalDiskMark_sriov

And that’s my old CX3-Pro with just a Linux bridge
SMB_512K_HDD

Why did I just buy the CX5, ah yes I remember, I was bored and there was no problem to solve…


That’s iSCSI, strange that the RND4K QiT1 value is exactly the same as over SMB, but in general something doesn’t seem to work with SRIOV

Libvirt iSCSI Backend
Libvirt_iSCSI_CX5_NVME

SR-IOV windows iSCSI-Initiator - MLNX_WinOF2-3_10_52010
Win_iSCSI_CX5_nvme

1 Like

just to document what the current status is.
I have changed a few things in the configuration, but performance is still bad

GRUB_CMDLINE_LINUX_DEFAULT=“verbose pci=nocrs iommu=pt amdgpu.sg_display=0 pcie_aspm=off mitigations=off udev.log_priority=3”

[manja-02 ~]# uname -a
Linux manja-02 6.4.16-1-MANJARO #1 SMP PREEMPT_DYNAMIC Wed Sep 13 12:21:42 UTC 2023 x86_64 GNU/Linux
[manja-02 ~]# /usr/bin/qemu-system-x86_64 --version
#QEMU emulator version 8.1.0
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers

VM config

[manja-02 ~]# virsh dumpxml win11-on
<domain type='kvm' id='1'>
  <name>win11-on</name>
  <uuid>edee61c5-a7ba-48a3-83b7-5915360f53b4</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/11"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>39108608</memory>
  <currentMemory unit='KiB'>39108608</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static'>16</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='1'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='3'/>
    <vcpupin vcpu='4' cpuset='4'/>
    <vcpupin vcpu='5' cpuset='5'/>
    <vcpupin vcpu='6' cpuset='6'/>
    <vcpupin vcpu='7' cpuset='7'/>
    <vcpupin vcpu='8' cpuset='8'/>
    <vcpupin vcpu='9' cpuset='9'/>
    <vcpupin vcpu='10' cpuset='10'/>
    <vcpupin vcpu='11' cpuset='11'/>
    <vcpupin vcpu='12' cpuset='12'/>
    <vcpupin vcpu='13' cpuset='13'/>
    <vcpupin vcpu='14' cpuset='14'/>
    <vcpupin vcpu='15' cpuset='15'/>
    <iothreadpin iothread='1' cpuset='15'/>
    <iothreadpin iothread='2' cpuset='14'/>
  </cputune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os firmware='efi'>
    <type arch='x86_64' machine='pc-q35-8.0'>hvm</type>
    <firmware>
      <feature enabled='no' name='enrolled-keys'/>
      <feature enabled='yes' name='secure-boot'/>
    </firmware>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/x64/OVMF_CODE.secboot.4m.fd</loader>
    <nvram template='/usr/share/edk2/x64/OVMF_VARS.4m.fd'>/var/lib/libvirt/qemu/nvram/win11-on_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <runtime state='on'/>
      <synic state='on'/>
      <stimer state='on'>
        <direct state='on'/>
      </stimer>
      <reset state='on'/>
      <vendor_id state='on' value='AuthenticAMD'/>
      <frequencies state='on'/>
      <reenlightenment state='on'/>
      <tlbflush state='on'/>
      <ipi state='on'/>
      <evmcs state='off'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
    <smm state='on'/>
    <ioapic driver='kvm'/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-Milan</model>
    <vendor>AMD</vendor>
    <topology sockets='1' dies='2' cores='8' threads='1'/>
    <feature policy='require' name='x2apic'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='avx512f'/>
    <feature policy='require' name='avx512dq'/>
    <feature policy='require' name='avx512ifma'/>
    <feature policy='require' name='avx512cd'/>
    <feature policy='require' name='avx512bw'/>
    <feature policy='require' name='avx512vl'/>
    <feature policy='require' name='avx512vbmi'/>
    <feature policy='require' name='avx512vbmi2'/>
    <feature policy='require' name='gfni'/>
    <feature policy='require' name='vaes'/>
    <feature policy='require' name='vpclmulqdq'/>
    <feature policy='require' name='avx512vnni'/>
    <feature policy='require' name='avx512bitalg'/>
    <feature policy='require' name='avx512-vpopcntdq'/>
    <feature policy='require' name='spec-ctrl'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='flush-l1d'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='avx512-bf16'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='stibp-always-on'/>
    <feature policy='require' name='virt-ssbd'/>
    <feature policy='require' name='amd-psfd'/>
    <feature policy='disable' name='lbrv'/>
    <feature policy='disable' name='tsc-scale'/>
    <feature policy='disable' name='vmcb-clean'/>
    <feature policy='disable' name='pause-filter'/>
    <feature policy='disable' name='pfthreshold'/>
    <feature policy='disable' name='v-vmsave-vmload'/>
    <feature policy='disable' name='vgif'/>
    <feature policy='disable' name='vnmi'/>
    <feature policy='require' name='no-nested-data-bp'/>
    <feature policy='require' name='lfence-always-serializing'/>
    <feature policy='require' name='null-sel-clr-base'/>
    <feature policy='require' name='auto-ibrs'/>
    <feature policy='require' name='rdctl-no'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='require' name='mds-no'/>
    <feature policy='require' name='pschange-mc-no'/>
    <feature policy='disable' name='pcid'/>
    <feature policy='require' name='topoext'/>
    <feature policy='require' name='invtsc'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='npt'/>
    <feature policy='disable' name='nrip-save'/>
    <feature policy='disable' name='svme-addr-chk'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='kvmclock' present='no'/>
    <timer name='hypervclock' present='yes'/>
    <timer name='tsc' present='yes' mode='native'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/user/Downloads/iso/Win11_22H2_German_x64v1.iso' index='2'/>
      <backingStore/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native' discard='unmap'/>
      <source dev='/dev/zvol/tank02/win11-on' index='1'/>
      <backingStore/>
      <target dev='sdf' bus='scsi'/>
      <boot order='1'/>
      <alias name='scsi0-0-0-5'/>
      <address type='drive' controller='0' bus='0' target='0' unit='5'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x16'/>
      <alias name='pci.7'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x17'/>
      <alias name='pci.8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x18'/>
      <alias name='pci.9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x19'/>
      <alias name='pci.10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x1'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x1a'/>
      <alias name='pci.11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x2'/>
    </controller>
    <controller type='pci' index='12' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='12' port='0x1b'/>
      <alias name='pci.12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x3'/>
    </controller>
    <controller type='pci' index='13' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='13' port='0x1c'/>
      <alias name='pci.13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x4'/>
    </controller>
    <controller type='pci' index='14' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='14' port='0x1d'/>
      <alias name='pci.14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x5'/>
    </controller>
    <controller type='pci' index='15' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='15' port='0x1e'/>
      <alias name='pci.15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x6'/>
    </controller>
    <controller type='pci' index='16' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <alias name='pci.16'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <driver queues='8' iothread='1'/>
      <alias name='scsi0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </controller>
    <controller type='scsi' index='1' model='virtio-scsi'>
      <driver queues='8' iothread='2'/>
      <alias name='scsi1'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:ad:e4:a9'/>
      <source bridge='br3'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </interface>
    <channel type='unix'>
      <source mode='bind' path='/run/libvirt/qemu/channel/1-win11-on/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'>
      <alias name='input0'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input1'/>
    </input>
    <tpm model='tpm-crb'>
      <backend type='emulator' version='2.0'/>
      <alias name='tpm0'/>
    </tpm>
    <audio id='1' type='none'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev0'/>
      <rom file='/etc/libvirt/navi31.rom'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x2'/>
      </source>
      <alias name='hostdev2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x03' slot='0x00' function='0x3'/>
      </source>
      <alias name='hostdev3'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x3'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <alias name='hostdev4'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x2'/>
      </source>
      <alias name='hostdev5'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x05' slot='0x00' function='0x6'/>
      </source>
      <alias name='hostdev6'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </hostdev>
    <watchdog model='itco' action='reset'>
      <alias name='watchdog0'/>
    </watchdog>
    <memballoon model='none'/>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+962:+962</label>
    <imagelabel>+962:+962</imagelabel>
  </seclabel>
</domain>


[  197.896686] ------------[ cut here ]------------
[  197.896686] Trying to free already-free IRQ 44
[  197.896688] WARNING: CPU: 31 PID: 2800 at kernel/irq/manage.c:1893 free_irq+0x226/0x3b0
[  197.896691] Modules linked in: iscsi_tcp libiscsi_tcp bridge stp llc qrtr rpcrdma sunrpc rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mousedev joydev amdgpu vfat fat intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd snd_usb_audio snd_hda_codec_hdmi snd_usbmidi_lib snd_hda_intel snd_rawmidi ccp asus_nb_wmi snd_intel_dspcfg eeepc_wmi snd_seq_device usbhid mc asus_wmi snd_intel_sdw_acpi drm_buddy kvm gpu_sched ledtrig_audio snd_hda_codec crct10dif_pclmul i8042 crc32_pclmul sparse_keymap polyval_clmulni i2c_algo_bit serio platform_profile polyval_generic rfkill mlx5_core wmi_bmof drm_suballoc_helper snd_hda_core gf128mul drm_ttm_helper r8169 snd_hwdep ghash_clmulni_intel snd_pcm sha512_ssse3 ttm aesni_intel mlxfw realtek crypto_simd snd_timer drm_display_helper psample mdio_devres sp5100_tco cryptd rapl tls snd ucsi_ccg pcspkr acpi_cpufreq cec k10temp i2c_piix4 ucsi_acpi libphy pci_hyperv_intf soundcore typec_ucsi video typec gpio_amdpt wmi roles
[  197.896714]  gpio_generic mac_hid zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) dm_multipath crypto_user fuse dm_mod loop bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 nvme nvme_core crc32c_intel xhci_pci xhci_pci_renesas nvme_common vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
[  197.896723] CPU: 31 PID: 2800 Comm: rpc-libvirtd Tainted: P        W  OE      6.4.16-1-MANJARO #1 b75fe5796da2edc38c34cd1a3d5a0deee650c91e
[  197.896725] Hardware name: ASUS System Product Name/ProArt B650-CREATOR, BIOS 1602 08/15/2023
[  197.896725] RIP: 0010:free_irq+0x226/0x3b0
[  197.896727] Code: 8e 02 00 49 8b 7f 30 e8 f8 cf 1c 00 4c 89 ff 49 8b 5f 50 e8 ec cf 1c 00 eb 3b 8b 74 24 04 48 c7 c7 f8 ae 64 a7 e8 ba 11 f6 ff <0f> 0b 48 89 ee 4c 89 ef e8 5d 6a c4 00 49 8b 86 80 00 00 00 48 8b
[  197.896728] RSP: 0018:ffffafa8a176bc20 EFLAGS: 00010082
[  197.896729] RAX: 0000000000000000 RBX: ffff9636c87a5028 RCX: 0000000000000027
[  197.896730] RDX: ffff964d5a1e16c8 RSI: 0000000000000001 RDI: ffff964d5a1e16c0
[  197.896731] RBP: 0000000000000246 R08: 0000000000000000 R09: ffffafa8a176bab0
[  197.896731] R10: 0000000000000003 R11: ffff964db7fa47a8 R12: ffff9636c27cedc0
[  197.896732] R13: ffff9636c27cece4 R14: ffff9636c27cec00 R15: ffff9636c76deda0
[  197.896733] FS:  00007f8de2bff6c0(0000) GS:ffff964d5a1c0000(0000) knlGS:0000000000000000
[  197.896734] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  197.896734] CR2: 00007f8ddc078070 CR3: 0000000de0678000 CR4: 0000000000750ee0
[  197.896735] PKRU: 55555554
[  197.896736] Call Trace:
[  197.896736]  <TASK>
[  197.896736]  ? free_irq+0x226/0x3b0
[  197.896738]  ? __warn+0x81/0x130
[  197.896740]  ? free_irq+0x226/0x3b0
[  197.896741]  ? report_bug+0x171/0x1a0
[  197.896743]  ? prb_read_valid+0x1b/0x30
[  197.896746]  ? handle_bug+0x3c/0x80
[  197.896747]  ? exc_invalid_op+0x17/0x70
[  197.896749]  ? asm_exc_invalid_op+0x1a/0x20
[  197.896752]  ? free_irq+0x226/0x3b0
[  197.896754]  devm_free_irq+0x58/0x80
[  197.896755]  i2c_dw_pci_remove+0x59/0x70
[  197.896757]  pci_device_remove+0x37/0xa0
[  197.896759]  device_release_driver_internal+0x19f/0x200
[  197.896761]  unbind_store+0xa1/0xb0
[  197.896762]  kernfs_fop_write_iter+0x133/0x1d0
[  197.896764]  vfs_write+0x22b/0x3f0
[  197.896766]  ksys_write+0x6f/0xf0
[  197.896767]  do_syscall_64+0x5d/0x90
[  197.896769]  ? do_sys_openat2+0x9b/0x170
[  197.896771]  ? syscall_exit_to_user_mode+0x2b/0x40
[  197.896772]  ? do_syscall_64+0x6c/0x90
[  197.896773]  ? syscall_exit_to_user_mode+0x2b/0x40
[  197.896774]  ? do_syscall_64+0x6c/0x90
[  197.896775]  ? syscall_exit_to_user_mode+0x2b/0x40
[  197.896777]  ? do_syscall_64+0x6c/0x90
[  197.896778]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  197.896780] RIP: 0033:0x7f8de6b0469f
[  197.896782] Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 e9 46 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 3c 47 f8 ff 48
[  197.896783] RSP: 002b:00007f8de2bfe460 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[  197.896784] RAX: ffffffffffffffda RBX: 000000000000001e RCX: 00007f8de6b0469f
[  197.896785] RDX: 000000000000000c RSI: 00007f8ddc060790 RDI: 000000000000001e
[  197.896785] RBP: 000000000000000c R08: 0000000000000000 R09: 0000000000000001
[  197.896786] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f8ddc060790
[  197.896786] R13: 000000000000001e R14: 0000000000000000 R15: 00007f8dd977b2d1
[  197.896788]  </TASK>
[  197.896788] ---[ end trace 0000000000000000 ]---

@Wendell the “Trying to free already-free IRQ 44” is caused by the RX7900XTX.
I couldn’t believe it, but I removed the CX5 from the system, created a fresh VM with basic configuration and the message comes every time the VM starts.
I obviously didn’t pay enough attention to the fact that the error message has changed somewhere along the way, or rather I didn’t miss the second one because the first one was gone.
I thing the “ioremap memtype_reserve failed -16” was solved by switching to Manjaro and creating the correct udev rule, which doesn’t name the cause but I don’t care right now
If the performance with SR-IOV and Windows VMs in Majaro remains bad, then I have to take a closer look at the Kubuntu situation.
Iperf3 shows almost 40Gb/s with an Ubuntu 23.10 VM and SR-IOV, Windows11 22H2 on the other hand can barely manage 20Gb/s and shows strange MTU problems, ​but that’s another problem that will be solved somehow.
What completely baffles me is the problem with the 7900XTX, although it seems to be more of a cosmetic problem, I can’t see any symptoms