Asus Pro WS WRX80E-Sage, dmesg is full of corrected PCIe and/or AER errors

So i got a Asus Pro WS WRX80E-Sage board that hosts a bunch of PCIe 4.0 NVME drives. As i kept adding drives, the dmesg log kept getting fuller and fuller of errors like the ones below.

Kernel 5.12.14-arch1-1

They go through all the drives at random, and don’t appear to be dependent on load, system temperature, whether the drive is in the expansion card or directly on the board or anything else i noticed.

The message says it was corrected by hardware and no action is required, the google says these are safe to ignore, but the problem is that the log is completely full of them to the point of nothing else being visible.

So, is that a sign of some problem, and if not is there a way to turn these off?

I’ve grepped the kernel code for these lines, but i can’t make enough sense of it to find out if there is a way to disable them or feel safe enough to comment them out. They do use a different set of words for uncorrected errors, which are missing in my logs, thankfully.

[94781.661358] {18945}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
[94781.661363] {18945}[Hardware Error]: It has been corrected by h/w and requires no further action
[94781.661364] {18945}[Hardware Error]: event severity: corrected
[94781.661366] {18945}[Hardware Error]:  Error 0, type: corrected
[94781.661368] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661369] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661370] {18945}[Hardware Error]:   version: 0.2
[94781.661371] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661373] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661375] {18945}[Hardware Error]:   slot: 0
[94781.661376] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661376] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661378] {18945}[Hardware Error]:   class_code: 010802
[94781.661379] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661380] {18945}[Hardware Error]:  Error 1, type: corrected
[94781.661381] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661382] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661383] {18945}[Hardware Error]:   version: 0.2
[94781.661384] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661385] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661387] {18945}[Hardware Error]:   slot: 0
[94781.661387] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661388] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661389] {18945}[Hardware Error]:   class_code: 010802
[94781.661390] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661391] {18945}[Hardware Error]:  Error 2, type: corrected
[94781.661392] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661393] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661394] {18945}[Hardware Error]:   version: 0.2
[94781.661394] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661395] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661397] {18945}[Hardware Error]:   slot: 0
[94781.661397] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661398] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661399] {18945}[Hardware Error]:   class_code: 010802
[94781.661400] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661401] {18945}[Hardware Error]:  Error 3, type: corrected
[94781.661402] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661402] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661403] {18945}[Hardware Error]:   version: 0.2
[94781.661404] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661405] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661406] {18945}[Hardware Error]:   slot: 0
[94781.661407] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661408] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661409] {18945}[Hardware Error]:   class_code: 010802
[94781.661409] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661410] {18945}[Hardware Error]:  Error 4, type: corrected
[94781.661411] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661412] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661413] {18945}[Hardware Error]:   version: 0.2
[94781.661414] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661415] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661416] {18945}[Hardware Error]:   slot: 0
[94781.661416] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661417] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661418] {18945}[Hardware Error]:   class_code: 010802
[94781.661419] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661420] {18945}[Hardware Error]:  Error 5, type: corrected
[94781.661421] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661422] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661423] {18945}[Hardware Error]:   version: 0.2
[94781.661423] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661424] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661425] {18945}[Hardware Error]:   slot: 0
[94781.661426] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661427] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661428] {18945}[Hardware Error]:   class_code: 010802
[94781.661429] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661430] {18945}[Hardware Error]:  Error 6, type: corrected
[94781.661431] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661431] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661432] {18945}[Hardware Error]:   version: 0.2
[94781.661433] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661434] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661435] {18945}[Hardware Error]:   slot: 0
[94781.661436] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661437] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661437] {18945}[Hardware Error]:   class_code: 010802
[94781.661438] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661439] {18945}[Hardware Error]:  Error 7, type: corrected
[94781.661440] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661441] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661442] {18945}[Hardware Error]:   version: 0.2
[94781.661443] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661444] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661445] {18945}[Hardware Error]:   slot: 0
[94781.661445] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661446] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661447] {18945}[Hardware Error]:   class_code: 010802
[94781.661448] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661449] {18945}[Hardware Error]:  Error 8, type: corrected
[94781.661450] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661450] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661451] {18945}[Hardware Error]:   version: 0.2
[94781.661452] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661453] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661454] {18945}[Hardware Error]:   slot: 0
[94781.661455] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661456] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661457] {18945}[Hardware Error]:   class_code: 010802
[94781.661458] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661459] {18945}[Hardware Error]:  Error 9, type: corrected
[94781.661459] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661460] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661461] {18945}[Hardware Error]:   version: 0.2
[94781.661462] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661463] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661464] {18945}[Hardware Error]:   slot: 0
[94781.661465] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661465] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661466] {18945}[Hardware Error]:   class_code: 010802
[94781.661467] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661468] {18945}[Hardware Error]:  Error 10, type: corrected
[94781.661469] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661470] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661471] {18945}[Hardware Error]:   version: 0.2
[94781.661472] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661472] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661474] {18945}[Hardware Error]:   slot: 0
[94781.661474] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661475] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661476] {18945}[Hardware Error]:   class_code: 010802
[94781.661477] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661478] {18945}[Hardware Error]:  Error 11, type: corrected
[94781.661479] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661480] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661480] {18945}[Hardware Error]:   version: 0.2
[94781.661481] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661482] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661483] {18945}[Hardware Error]:   slot: 0
[94781.661484] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661485] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661486] {18945}[Hardware Error]:   class_code: 010802
[94781.661487] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661487] {18945}[Hardware Error]:  Error 12, type: corrected
[94781.661488] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661489] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661490] {18945}[Hardware Error]:   version: 0.2
[94781.661491] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661492] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661493] {18945}[Hardware Error]:   slot: 0
[94781.661493] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661494] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661495] {18945}[Hardware Error]:   class_code: 010802
[94781.661496] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661497] {18945}[Hardware Error]:  Error 13, type: corrected
[94781.661498] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661499] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661499] {18945}[Hardware Error]:   version: 0.2
[94781.661500] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661501] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661502] {18945}[Hardware Error]:   slot: 0
[94781.661503] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661504] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661505] {18945}[Hardware Error]:   class_code: 010802
[94781.661506] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661534] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661536] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661539] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661545] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661546] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661548] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661553] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661554] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661556] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661561] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661562] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661563] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661568] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661570] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661571] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661576] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661577] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661579] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661584] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661585] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661586] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661591] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661592] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661594] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661599] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661600] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661601] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661606] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661608] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661609] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661614] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661615] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661617] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661621] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661623] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661624] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661629] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661630] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661632] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661637] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661638] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661639] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID

Are you adding nvme through a carrier board? If yes, which one? Do you still get errors if you drop down to pcie 3.0.

There was a discussion at some point on how not all boards designed for PCIe 3 are good enough at preserving signal integrity at clocks required for PCIe 4.

Yes, using the included ASUS HYPER M.2 card which is supposed to be 4.0 rated.

But also the problem shows up even with only the boot SSD present, in the motherboard slot.

Setting the drives to PCIe 3.0 did get rid of the errors…
Which is suboptimal as ways of getting rid of annoying warnings go.

1 Like

Perhaps trying to put the card into another slot, re-socketing the nvme drives, or re-socketing the CPU might help.

Short of getting another nvme riser / carrier board.

I’m getting these too. It says “slot 0” in the error message (in mine too)… shouldn’t slot 0 be the top slot, usually used for the GPU? I’ve got two Hyper M.2s now and neither are in the slot closest to the CPU.

Thus far, the RAIDs (one per card) appear OK…

Any update on how things have gone since first post? I’m thinking of just letting this one slide I dunno…

@artlav dumb question: did you configure your drives as a RAID array in BIOS? I did that and at least I see no dmesg errors for that array. The errors I see are for another array I’m testing where I didn’t bother (yet) to make it an array in BIOS.

(Not that making it a RAID array in BIOS appears to do anything? At least for me, after I made it an array in BIOS I still just used mdadm to set up software RAID0).

No, they were set as individual disks.

@artlav I’ve got the same issue. You can turn off AER for NVMe disks on the card. This works for me, follow the link:

[Look here](https://gist.github.com/zekome/35db528b33206e68f18439ad7fabfcd5)