Hi,
I found posts in other forums with the same issue for CX4 and CX6, it seems to affect everything above the CX3 on AM5.
I tried “pci=nocrs” and disabled “Above 4G Decoding” but it did not help.
I have zero hope that AMD is interested in SR -IOV problems with enterprise hardware on AM5, no idea how to solve this, buy an Intel 810 and use the CX5 for the server side?
Host System
AM5 - ASUS B650 Creator Bios 1602 - AGESA1.0.0.7c
Mellanox ConnectX-5 EX MCX516A-CDAT - firmware version: 16.35.2000
Host OS: Kubuntu 23.04
Kernel 6.2.0-32-generic
QEMU emulator version 7.2.0 (Debian 1:7.2+dfsg-5ubuntu2.2)
MLNX_OFED_LINUX-23.07-0.5.1.2-ubuntu23.04-x86_64
Testet Guest Systems
Kubuntu 23.04 6.2.0-32-generic
MLNX_OFED_LINUX-23.07-0.5.1.2-ubuntu23.04-x86_64
dmesg kubuntu guest
[ 21.188581] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 100s (0xffffffff)
[ 41.192590] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 79s (0xffffffff)
[ 61.196576] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 59s (0xffffffff)
[ 81.200595] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 39s (0xffffffff)
[ 101.204567] mlx5_core 0000:07:00.0: wait_fw_init:199:(pid 124): Waiting for FW initialization, timeout abort in 19s (0xffffffff)
[ 121.196566] mlx5_core 0000:07:00.0: mlx5_function_setup:1112:(pid 124): Firmware over 120000 MS in pre-initializing state, aborting
[ 121.196574] fbcon: Taking over console
[ 121.196579] mlx5_core 0000:07:00.0: probe_one:1741:(pid 124): mlx5_init_one failed with error code -16
[ 121.204608] mlx5_core: probe of 0000:07:00.0 failed with error -16
Windows 11 22H2
MLNX_WinOF2-3_10_52010_All_x64
Error - this device cannot start. code 10
SR-IOV is set in Mainboard UEFI and CX5 firmware, everything is fine up to this point, four VF where created and dmesg does not show any errors
enp5s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 1c:34:da:71:b8:3a brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 00:22:33:44:55:66 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 1 link/ether 00:22:33:44:55:67 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 2 link/ether 00:22:33:44:55:68 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
vf 3 link/ether 00:22:33:44:55:69 brd ff:ff:ff:ff:ff:ff, spoof checking off, link-state auto, trust off, query_rss off
dmesg host system before starting the VM
[ 2851.961100] mlx5_core 0000:05:00.0: E-Switch: Enable: mode(LEGACY), nvfs(4), necvfs(0), active vports(5)
[ 2852.068907] pci 0000:05:00.2: [15b3:101a] type 00 class 0x020000
[ 2852.068970] pci 0000:05:00.2: enabling Extended Tags
[ 2852.069733] pci 0000:05:00.2: Adding to iommu group 37
[ 2852.069917] mlx5_core 0000:05:00.2: enabling device (0000 -> 0002)
[ 2852.069990] mlx5_core 0000:05:00.2: firmware version: 16.35.2000
[ 2852.216760] mlx5_core 0000:05:00.2: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2852.230329] mlx5_core 0000:05:00.2: Assigned random MAC address 9e:12:f7:cd:1e:c0
[ 2852.384148] mlx5_core 0000:05:00.2: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2852.386033] mlx5_core 0000:05:00.2 enp5s0f2v0: renamed from eth0
[ 2852.444191] pci 0000:05:00.3: [15b3:101a] type 00 class 0x020000
[ 2852.444251] pci 0000:05:00.3: enabling Extended Tags
[ 2852.444995] pci 0000:05:00.3: Adding to iommu group 38
[ 2852.445094] mlx5_core 0000:05:00.3: enabling device (0000 -> 0002)
[ 2852.445154] mlx5_core 0000:05:00.3: firmware version: 16.35.2000
[ 2852.587542] mlx5_core 0000:05:00.2 enp5s0f2v0: Link up
[ 2852.589409] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f2v0: link becomes ready
[ 2852.601351] mlx5_core 0000:05:00.3: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2852.626958] mlx5_core 0000:05:00.3: Assigned random MAC address ca:47:ce:5d:f7:ce
[ 2852.782367] mlx5_core 0000:05:00.3: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2852.783905] mlx5_core 0000:05:00.3 enp5s0f3v1: renamed from eth0
[ 2852.841278] pci 0000:05:00.4: [15b3:101a] type 00 class 0x020000
[ 2852.841338] pci 0000:05:00.4: enabling Extended Tags
[ 2852.842083] pci 0000:05:00.4: Adding to iommu group 39
[ 2852.842201] mlx5_core 0000:05:00.4: enabling device (0000 -> 0002)
[ 2852.842264] mlx5_core 0000:05:00.4: firmware version: 16.35.2000
[ 2852.985360] mlx5_core 0000:05:00.3 enp5s0f3v1: Link up
[ 2852.999307] mlx5_core 0000:05:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2853.105314] mlx5_core 0000:05:00.4: Assigned random MAC address 16:8b:bc:81:48:03
[ 2853.257975] mlx5_core 0000:05:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2853.260030] mlx5_core 0000:05:00.4 enp5s0f4v2: renamed from eth0
[ 2853.319024] pci 0000:05:00.5: [15b3:101a] type 00 class 0x020000
[ 2853.319085] pci 0000:05:00.5: enabling Extended Tags
[ 2853.319839] pci 0000:05:00.5: Adding to iommu group 40
[ 2853.319960] mlx5_core 0000:05:00.5: enabling device (0000 -> 0002)
[ 2853.320022] mlx5_core 0000:05:00.5: firmware version: 16.35.2000
[ 2853.476959] mlx5_core 0000:05:00.5: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2853.479937] mlx5_core 0000:05:00.4 enp5s0f4v2: Link up
[ 2853.501051] mlx5_core 0000:05:00.5: Assigned random MAC address ea:63:7d:c2:1f:de
[ 2853.613981] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f3v1: link becomes ready
[ 2853.614190] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f4v2: link becomes ready
[ 2853.663106] mlx5_core 0000:05:00.5: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2853.664776] mlx5_core 0000:05:00.5 enp5s0f5v3: renamed from eth0
[ 2853.866614] mlx5_core 0000:05:00.5 enp5s0f5v3: Link up
[ 2854.641583] IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f5v3: link becomes ready
But as soon as the VM is started with a VF, this errors appear in the kernel log
[ 2987.068846] ioremap memtype_reserve failed -16
[ 2987.080846] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.080847] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.080848] ioremap memtype_reserve failed -16
[ 2987.092830] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.092831] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.092832] ioremap memtype_reserve failed -16
[ 2987.104843] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.104845] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.104845] ioremap memtype_reserve failed -16
[ 2987.116844] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.116846] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.116847] ioremap memtype_reserve failed -16
[ 2987.128840] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.128841] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.128842] ioremap memtype_reserve failed -16
[ 2987.140842] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.140844] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.140844] ioremap memtype_reserve failed -16