[SOLVED] MSI X470 Carbon Gaming Pro SR-IOV woes

I have a Dell-branded I350-T4 I want to run VFs on and pass them through to my pfSense VM.

IOMMU is enabled in the linux commandline and all that stuff. I used this guide as reference: [Tutorial] Enabling SR-IOV for Intel NIC (X550-T2) on Proxmox 6 : Proxmox

TL;DR: The card freezes when I attempt to enable VFs.

Backstory:
This motherboard was sold as bricked and revived by hardware bios flashing. I thought it would work nicely as a replacement for my old ASRock B350m Pro4. This one has a newer chipset and 8 sata ports instead of 4, which is nice for future NASing.

To my disappointment neither SR-IOV or AMD CBS were visible in the BIOS (they are both present in the cheaper B350M).

Being a certified destroyer of electronics and kinda bored I took to the internet to mod the bios to enable the SR-IOV menu. Forum user genius239 at win-raid.com helped me out with enabling the hidden “Advanced” menu in the bios, but I still couldn’t find PCIE ARI, which is under AMD CBS.
I fired up a hex editor and looked at some AMD CBS enablement mods 1usmus did for X370 boards and repeated the step for this one. And after nulling 32 bytes in what I assume is a “hidden items” list my bios now has both CBS and PBS.

The problem at hand:
I enabled PCIE ARI and IOMMU and SR-IOV and the card just freezes when I attempt to enable VFs. DMESG prints a stackdump from the igb driver complaining about “resetting PF”.

I tried enabling 2 VFs and it left a state of 1 VF and a broken network connection requiring a restart.

So: Am I missing some settings somewhere or is this just a case of it not being supported by the motherboard and I should just use the old B350M instead?

I tried again and the IGB driver dumped the stack into syslog (pastebin link):

Apr 15 13:18:23 pve kernel: [   27.009364] ------------[ cut here ]------------
Apr 15 13:18:23 pve kernel: [   27.009655] igb: Failed to read reg 0x8!
Apr 15 13:18:23 pve kernel: [   27.009991] WARNING: CPU: 4 PID: 2243 at drivers/net/ethernet/intel/igb/igb_main.c:757 igb_rd32.cold.105+0x40/0x4c [igb]
Apr 15 13:18:23 pve kernel: [   27.010264] Modules linked in: veth bluetooth ecdh_generic ecc ebtable_filter ebtables ip_set ip6table_raw iptable_raw softdog nfnetlink_log binfmt_misc nfnetlink nf_log_ipv6 xt_hl ip6t_rt input_leds ip6t_REJECT nf_reject_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_addrtype xt_tcpudp snd_hda_codec_hdmi zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) xt_conntrack icp(PO) ipt_REJECT nf_reject_ipv4 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio edac_mce_amd zcommon(PO) nouveau znvpair(PO) kvm_amd spl(O) snd_hda_intel video vhost_net snd_intel_dspcfg ttm kvm snd_hda_codec vhost crct10dif_pclmul tap crc32_pclmul drm_kms_helper snd_hda_core ghash_clmulni_intel ib_iser snd_hwdep drm aesni_intel rdma_cm snd_pcm fb_sys_fops iw_cm crypto_simd syscopyarea snd_timer cryptd glue_helper pcspkr wmi_bmof mxm_wmi k10temp sysfillrect ib_cm snd ccp sysimgblt soundcore ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_virqfd mac_hid irqbypass vfio_iommu_type1 vfio msr
Apr 15 13:18:23 pve kernel: [   27.010285]  ip6table_filter ip6_tables nct6775 hwmon_vid nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 lm75 sunrpc iptable_filter bpfilter ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_generic usbkbd usbhid hid i2c_piix4 igb xhci_pci i2c_algo_bit ahci dca xhci_hcd libahci gpio_amdpt wmi gpio_generic
Apr 15 13:18:23 pve kernel: [   27.014157] pcieport 0000:00:03.1: AER: Uncorrected (Non-Fatal) error received: 0000:26:00.0
Apr 15 13:18:23 pve kernel: [   27.014674] CPU: 4 PID: 2243 Comm: zsh Tainted: P           O      5.4.106-1-pve #1
Apr 15 13:18:23 pve kernel: [   27.015209] igb 0000:26:00.0: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Apr 15 13:18:23 pve kernel: [   27.015733] Hardware name: Micro-Star International Co., Ltd. MS-7B78/X470 GAMING PRO CARBON (MS-7B78), BIOS 2.E0 06/10/2020
Apr 15 13:18:23 pve kernel: [   27.016306] igb 0000:26:00.0: AER:   device [8086:1521] error status/mask=00100000/00000000
Apr 15 13:18:23 pve kernel: [   27.016861] RIP: 0010:igb_rd32.cold.105+0x40/0x4c [igb]
Apr 15 13:18:23 pve kernel: [   27.016863] Code: 45 08 00 00 00 00 e8 6a 47 9c d0 49 8b bd 30 ff ff ff e8 df 71 47 d0 84 c0 74 16 44 89 e6 48 c7 c7 f0 fd 2e c0 e8 68 a0 97 d0 <0f> 0b e9 48 3c fe ff e9 5b 3c fe ff 8b b3 14 18 00 00 49 8d bc 24
Apr 15 13:18:23 pve kernel: [   27.017423] igb 0000:26:00.0: AER:    [20] UnsupReq               (First)
Apr 15 13:18:23 pve kernel: [   27.017990] RSP: 0018:ffffb6ee81743d28 EFLAGS: 00010282
Apr 15 13:18:23 pve kernel: [   27.019198] igb 0000:26:00.0: AER:   TLP Header: 40001001 0000000f f7505558 f7505558
Apr 15 13:18:23 pve kernel: [   27.019797] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 0000000000000006
Apr 15 13:18:23 pve kernel: [   27.019798] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff95808e9178c0
Apr 15 13:18:23 pve kernel: [   27.022270] RBP: ffffb6ee81743d40 R08: 000000000000048f R09: 00000000ffffffff
Apr 15 13:18:23 pve kernel: [   27.022893] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000008
Apr 15 13:18:23 pve kernel: [   27.023520] R13: ffff95807dd50e08 R14: ffff95807dd50e08 R15: 0000000000000000
Apr 15 13:18:23 pve kernel: [   27.024144] FS:  00007f8bc0d47740(0000) GS:ffff95808e900000(0000) knlGS:0000000000000000
Apr 15 13:18:23 pve kernel: [   27.024764] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 15 13:18:23 pve kernel: [   27.025386] CR2: ffffb822406d8000 CR3: 00000003f0226000 CR4: 00000000003406e0
Apr 15 13:18:23 pve kernel: [   27.026000] Call Trace:
Apr 15 13:18:23 pve kernel: [   27.026612]  igb_rar_set_index+0xa4/0x120 [igb]
Apr 15 13:18:23 pve kernel: [   27.027219]  igb_set_vf_mac+0x7a/0x80 [igb]
Apr 15 13:18:23 pve kernel: [   27.027827]  igb_enable_sriov.cold.106+0x4a/0xa9 [igb]
Apr 15 13:18:23 pve kernel: [   27.028435]  igb_pci_sriov_configure+0x28/0x50 [igb]
Apr 15 13:18:23 pve kernel: [   27.029150]  sriov_numvfs_store+0xb8/0x130
Apr 15 13:18:23 pve kernel: [   27.029746]  dev_attr_store+0x17/0x30
Apr 15 13:18:23 pve kernel: [   27.030334]  sysfs_kf_write+0x3b/0x40
Apr 15 13:18:23 pve kernel: [   27.030925]  kernfs_fop_write+0xda/0x1c0
Apr 15 13:18:23 pve kernel: [   27.031508]  __vfs_write+0x1b/0x40
Apr 15 13:18:23 pve kernel: [   27.032090]  vfs_write+0xab/0x1b0
Apr 15 13:18:23 pve kernel: [   27.032676]  ksys_write+0x61/0xe0
Apr 15 13:18:23 pve kernel: [   27.033257]  __x64_sys_write+0x1a/0x20
Apr 15 13:18:23 pve kernel: [   27.033837]  do_syscall_64+0x57/0x190
Apr 15 13:18:23 pve kernel: [   27.034417]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 15 13:18:23 pve kernel: [   27.034986] RIP: 0033:0x7f8bc0e57504
Apr 15 13:18:23 pve kernel: [   27.035537] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
Apr 15 13:18:23 pve kernel: [   27.036674] RSP: 002b:00007ffebdacee98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Apr 15 13:18:23 pve kernel: [   27.037208] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f8bc0e57504
Apr 15 13:18:23 pve kernel: [   27.037732] RDX: 0000000000000002 RSI: 00005572fde9b1e0 RDI: 0000000000000001
Apr 15 13:18:23 pve kernel: [   27.038239] RBP: 00005572fde9b1e0 R08: 00007f8bc0f2a8c0 R09: 00007f8bc0d47740
Apr 15 13:18:23 pve kernel: [   27.038731] R10: 000000000000000a R11: 0000000000000246 R12: 00007f8bc0f29760
Apr 15 13:18:23 pve kernel: [   27.039186] R13: 0000000000000002 R14: 00007f8bc0f24760 R15: 0000000000000002
Apr 15 13:18:23 pve kernel: [   27.039643] ---[ end trace bb1b4a13d9389a55 ]---
Apr 15 13:18:23 pve kernel: [   27.040141] pci 0000:26:10.0: Removing from iommu group 27
Apr 15 13:18:23 pve kernel: [   27.040615] igbvf: Intel(R) Gigabit Virtual Function Network Driver - version 2.4.0-k
Apr 15 13:18:23 pve kernel: [   27.040917] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
Apr 15 13:18:23 pve kernel: [   27.041356] igbvf 0000:26:10.4: enabling device (0000 -> 0002)
Apr 15 13:18:23 pve kernel: [   27.043070] igbvf 0000:26:10.4: PF still in reset state. Is the PF interface up?
Apr 15 13:18:23 pve kernel: [   27.043778] igbvf 0000:26:10.4: Assigning random MAC address.
Apr 15 13:18:23 pve kernel: [   27.045238] igbvf 0000:26:10.4: PF still resetting

It ended up working after I enabled x4/x4/x4/x4 for the first PCIe slot. Not sure why.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.