ROMED8-2T Tons of PCIe Errors with NVMe drives installed

Hi everyone,

I’m receiving tons of PCIe errors on a ROMED8-2T motherboard:

[ +0.178773] nvme 0000:81:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update
[ +0.072284] nvme 0000:82:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update
[ +0.132053] nvme 0000:a4:00.0: VPD access failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update

The server is running latest Unraid 6.12.8, and these errors are reported on some NVMe drives, not all. I have 10 drives, some 3.0, some 4.0. Using Asus Hyper cards, and 2x 4.0 drives onboard the ROMED8-2T.

The motherboard is running BIOS 3.70.

These are the errors listed by the BIOS (via IPMI):

|0030|Sunday, February 25th 2024, 3:18:08 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|13h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_19
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0029|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|feh|critical_interrupt|13h|6fh|abh|47h|01h|Bus Degraded - Asserted
Event Data1 Bus Degraded
Event Data2 Bus number: 0x47
Event Data3 Link speed degraded|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0028|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|feh|critical_interrupt|13h|6fh|abh|47h|01h|Bus Degraded - Asserted
Event Data1 Bus Degraded
Event Data2 Bus number: 0x47
Event Data3 Link speed degraded|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0027|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0026|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0025|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0024|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0023|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0022|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0021|Sunday, February 25th 2024, 3:17:25 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0020|Sunday, February 25th 2024, 3:17:24 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0019|Sunday, February 25th 2024, 3:17:24 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0018|Sunday, February 25th 2024, 3:17:24 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0017|Sunday, February 25th 2024, 3:17:23 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0016|Sunday, February 25th 2024, 3:17:23 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0015|Sunday, February 25th 2024, 3:17:22 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0014|Sunday, February 25th 2024, 3:17:22 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|02h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_2
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0013|Sunday, February 25th 2024, 3:17:14 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|0ch|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_12
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0012|Sunday, February 25th 2024, 3:17:07 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|06h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_6
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0011|Sunday, February 25th 2024, 3:17:05 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|0ch|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_12
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0010|Sunday, February 25th 2024, 3:17:04 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|06h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_6
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0009|Sunday, February 25th 2024, 3:17:01 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|0ch|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_12
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0008|Sunday, February 25th 2024, 3:17:01 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|06h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_6
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0007|Sunday, February 25th 2024, 3:17:01 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|0ch|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_12
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0006|Sunday, February 25th 2024, 3:16:58 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|09h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_9
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0005|Sunday, February 25th 2024, 3:16:44 pm|0021h|BIOS|00h|critical_interrupt|13h|6fh|a4h|80h|0bh|PCI PERR - Asserted
Event Data1 PCI PERR
Event Data2 PCI Bus number : 128
Event Data3 PCI Device number : 1, PCI Function number : 3|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0004|Sunday, February 25th 2024, 3:16:44 pm|0021h|BIOS|00h|critical_interrupt|13h|6fh|a4h|80h|0ah|PCI PERR - Asserted
Event Data1 PCI PERR
Event Data2 PCI Bus number : 128
Event Data3 PCI Device number : 1, PCI Function number : 2|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0003|Sunday, February 25th 2024, 3:16:43 pm|0021h|BIOS|00h|critical_interrupt|13h|6fh|a4h|80h|09h|PCI PERR - Asserted
Event Data1 PCI PERR
Event Data2 PCI Bus number : 128
Event Data3 PCI Device number : 1, PCI Function number : 1|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0002|Sunday, February 25th 2024, 3:16:43 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|07h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_7
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

|0001|Sunday, February 25th 2024, 3:16:32 pm|0001h|BIOS|00h|bios_post_progress|0fh|6fh|c2h|06h|ffh|Progress - Asserted
Event Data1 Progress
Event Data2 progress_6
Event Data3 N/A|
| β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” | β€” |

I’ve changed β€œPCIe Aer Reporting Mechanism” to β€œOS First” which suppressed some of the errors so that IPMI doesn’t get flooded.

Any idea what could be wrong?

1 Like