AM5 Mellanox VM Passthrough Memory Errors

I’m attempting to passthrough a Mellanox ConnectX-4 NIC to a VM and gettng memory errors in dmesg:

...
[32278.078269] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.078271] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.078272] ioremap memtype_reserve failed -16
[32278.082272] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.082275] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.082277] ioremap memtype_reserve failed -16
[32278.086270] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.086273] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.086274] ioremap memtype_reserve failed -16
[32278.090268] x86/PAT: CPU 2/KVM:108165 conflicting memory types ea000000-ec000000 uncached-minus<->write-combining
[32278.090270] x86/PAT: memtype_reserve failed [mem 0xea000000-0xebffffff], track uncached-minus, req uncached-minus
[32278.090271] ioremap memtype_reserve failed -16
...

These errors repeat hundreds or thousands of times while the VM is starting. Eventually the VM boots properly, lspci in the guest shows the NIC, but doesn’t load the driver for it so it’s unusable.

I’m able to passthrough other PCI devices like NVMe SSDs and it works fine with no dmesg errors, it’s specifically the Mellanox NICs that have problems. I’ve tried multiple different ConnectX-4 cards, and the passthrough works fine on other machines, but not this one. I’ve tried using SR-IOV and passing through just one virtual function and that also causes the same errors. I’ve also tried the NIC in different PCIe slots and the same thing happens. Each port of the NIC is in it’s own IOMMU group, and I’ve tried passing in each individual port, as well as both ports together, each time getting the same errors

This is with a new MSI X670E ACE motherboard with a 7950X CPU. I’m running Fedora 37 with kernel 6.1.7.

Is this an incompatibility with the new AM5 platform, a hardware compatibility issue, a Linux configuration issue, or something else entirely? I wasn’t able to find any suggestions on Google.

Hi nofdak,
have you found a solution for this?
If I go to AM5, the MSI X670E ACE would be a candidate for me, at the moment I still have CX-3 pro, but an upgrade to CX5 is coming for sure.

in the KVM config add KVM_SET_USER_MEMORY_REGION

this happens with GPU passthrough sometimes too.

Hi @nofdak,

I’m curious if you had any issues getting your ConnextX-4 NIC to get recognized by your mobo at all? I have a similar situation with an AsRock x670e Taichi and I can’t even get the mobo to recognize there is a pcie card plugged in. No combination of BIOS options seems to change that.

@nofdak

have you found a solution for this? I have the same problem now, AM5 system, ConnectX-5 and kernel 6.2.

[ 2987.032837] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.032838] ioremap memtype_reserve failed -16
[ 2987.044844] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.044845] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.044846] ioremap memtype_reserve failed -16
[ 2987.056839] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.056841] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.056842] ioremap memtype_reserve failed -16
[ 2987.068844] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.068845] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.068846] ioremap memtype_reserve failed -16
[ 2987.080846] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.080847] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.080848] ioremap memtype_reserve failed -16
[ 2987.092830] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.092831] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.092832] ioremap memtype_reserve failed -16
[ 2987.104843] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.104845] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.104845] ioremap memtype_reserve failed -16
[ 2987.116844] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.116846] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.116847] ioremap memtype_reserve failed -16
[ 2987.128840] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.128841] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.128842] ioremap memtype_reserve failed -16
[ 2987.140842] x86/PAT: CPU 2/KVM:102943 conflicting memory types fcf6c00000-fcf6e00000 uncached-minus<->write-combining
[ 2987.140844] x86/PAT: memtype_reserve failed [mem 0xfcf6c00000-0xfcf6dfffff], track uncached-minus, req uncached-minus
[ 2987.140844] ioremap memtype_reserve failed -16
[ 2987.237778] br3: port 2(vnet6) entered disabled state
[ 2987.238018] device vnet6 left promiscuous mode
[ 2987.238021] br3: port 2(vnet6) entered disabled state
[ 2987.745756] mlx5_core 0000:05:00.4: enabling device (0000 -> 0002)
[ 2987.745831] mlx5_core 0000:05:00.4: firmware version: 16.35.2000
[ 2987.888793] mlx5_core 0000:05:00.4: Rate limit: 127 rates are supported, range: 0Mbps to 97656Mbps
[ 2987.902296] mlx5_core 0000:05:00.4: Assigned random MAC address ea:14:39:f2:cc:75
[ 2988.058772] mlx5_core 0000:05:00.4: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0 basic)
[ 2988.060614] mlx5_core 0000:05:00.4 enp5s0f4v2: renamed from eth0
[ 2988.226715] mlx5_core 0000:05:00.4 enp5s0f4v2: Link up

Is this with the OFED driver or the open source kernel driver? Does the other variant work?

MLNX_OFED_LINUX-23.07-0.5.1.2-ubuntu23.04-x86_64
CX-5 firmware version: 16.35.2000
AM5 AGESA1.0.0.7c

I get error code 10 for the VF in Windows with MLNX_OFED and mlx5_core