Asus Pro WS WRX80E-Sage, dmesg is full of corrected PCIe and/or AER errors

So i got a Asus Pro WS WRX80E-Sage board that hosts a bunch of PCIe 4.0 NVME drives. As i kept adding drives, the dmesg log kept getting fuller and fuller of errors like the ones below.

Kernel 5.12.14-arch1-1

They go through all the drives at random, and don’t appear to be dependent on load, system temperature, whether the drive is in the expansion card or directly on the board or anything else i noticed.

The message says it was corrected by hardware and no action is required, the google says these are safe to ignore, but the problem is that the log is completely full of them to the point of nothing else being visible.

So, is that a sign of some problem, and if not is there a way to turn these off?

I’ve grepped the kernel code for these lines, but i can’t make enough sense of it to find out if there is a way to disable them or feel safe enough to comment them out. They do use a different set of words for uncorrected errors, which are missing in my logs, thankfully.

[94781.661358] {18945}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
[94781.661363] {18945}[Hardware Error]: It has been corrected by h/w and requires no further action
[94781.661364] {18945}[Hardware Error]: event severity: corrected
[94781.661366] {18945}[Hardware Error]:  Error 0, type: corrected
[94781.661368] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661369] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661370] {18945}[Hardware Error]:   version: 0.2
[94781.661371] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661373] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661375] {18945}[Hardware Error]:   slot: 0
[94781.661376] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661376] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661378] {18945}[Hardware Error]:   class_code: 010802
[94781.661379] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661380] {18945}[Hardware Error]:  Error 1, type: corrected
[94781.661381] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661382] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661383] {18945}[Hardware Error]:   version: 0.2
[94781.661384] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661385] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661387] {18945}[Hardware Error]:   slot: 0
[94781.661387] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661388] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661389] {18945}[Hardware Error]:   class_code: 010802
[94781.661390] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661391] {18945}[Hardware Error]:  Error 2, type: corrected
[94781.661392] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661393] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661394] {18945}[Hardware Error]:   version: 0.2
[94781.661394] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661395] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661397] {18945}[Hardware Error]:   slot: 0
[94781.661397] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661398] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661399] {18945}[Hardware Error]:   class_code: 010802
[94781.661400] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661401] {18945}[Hardware Error]:  Error 3, type: corrected
[94781.661402] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661402] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661403] {18945}[Hardware Error]:   version: 0.2
[94781.661404] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661405] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661406] {18945}[Hardware Error]:   slot: 0
[94781.661407] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661408] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661409] {18945}[Hardware Error]:   class_code: 010802
[94781.661409] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661410] {18945}[Hardware Error]:  Error 4, type: corrected
[94781.661411] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661412] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661413] {18945}[Hardware Error]:   version: 0.2
[94781.661414] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661415] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661416] {18945}[Hardware Error]:   slot: 0
[94781.661416] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661417] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661418] {18945}[Hardware Error]:   class_code: 010802
[94781.661419] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661420] {18945}[Hardware Error]:  Error 5, type: corrected
[94781.661421] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661422] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661423] {18945}[Hardware Error]:   version: 0.2
[94781.661423] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661424] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661425] {18945}[Hardware Error]:   slot: 0
[94781.661426] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661427] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661428] {18945}[Hardware Error]:   class_code: 010802
[94781.661429] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661430] {18945}[Hardware Error]:  Error 6, type: corrected
[94781.661431] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661431] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661432] {18945}[Hardware Error]:   version: 0.2
[94781.661433] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661434] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661435] {18945}[Hardware Error]:   slot: 0
[94781.661436] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661437] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661437] {18945}[Hardware Error]:   class_code: 010802
[94781.661438] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661439] {18945}[Hardware Error]:  Error 7, type: corrected
[94781.661440] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661441] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661442] {18945}[Hardware Error]:   version: 0.2
[94781.661443] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661444] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661445] {18945}[Hardware Error]:   slot: 0
[94781.661445] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661446] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661447] {18945}[Hardware Error]:   class_code: 010802
[94781.661448] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661449] {18945}[Hardware Error]:  Error 8, type: corrected
[94781.661450] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661450] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661451] {18945}[Hardware Error]:   version: 0.2
[94781.661452] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661453] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661454] {18945}[Hardware Error]:   slot: 0
[94781.661455] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661456] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661457] {18945}[Hardware Error]:   class_code: 010802
[94781.661458] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661459] {18945}[Hardware Error]:  Error 9, type: corrected
[94781.661459] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661460] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661461] {18945}[Hardware Error]:   version: 0.2
[94781.661462] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661463] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661464] {18945}[Hardware Error]:   slot: 0
[94781.661465] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661465] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661466] {18945}[Hardware Error]:   class_code: 010802
[94781.661467] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661468] {18945}[Hardware Error]:  Error 10, type: corrected
[94781.661469] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661470] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661471] {18945}[Hardware Error]:   version: 0.2
[94781.661472] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661472] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661474] {18945}[Hardware Error]:   slot: 0
[94781.661474] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661475] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661476] {18945}[Hardware Error]:   class_code: 010802
[94781.661477] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661478] {18945}[Hardware Error]:  Error 11, type: corrected
[94781.661479] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661480] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661480] {18945}[Hardware Error]:   version: 0.2
[94781.661481] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661482] {18945}[Hardware Error]:   device_id: 0000:43:00.0
[94781.661483] {18945}[Hardware Error]:   slot: 0
[94781.661484] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661485] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661486] {18945}[Hardware Error]:   class_code: 010802
[94781.661487] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661487] {18945}[Hardware Error]:  Error 12, type: corrected
[94781.661488] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661489] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661490] {18945}[Hardware Error]:   version: 0.2
[94781.661491] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661492] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661493] {18945}[Hardware Error]:   slot: 0
[94781.661493] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661494] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661495] {18945}[Hardware Error]:   class_code: 010802
[94781.661496] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661497] {18945}[Hardware Error]:  Error 13, type: corrected
[94781.661498] {18945}[Hardware Error]:   section_type: PCIe error
[94781.661499] {18945}[Hardware Error]:   port_type: 0, PCIe end point
[94781.661499] {18945}[Hardware Error]:   version: 0.2
[94781.661500] {18945}[Hardware Error]:   command: 0x0406, status: 0x0010
[94781.661501] {18945}[Hardware Error]:   device_id: 0000:44:00.0
[94781.661502] {18945}[Hardware Error]:   slot: 0
[94781.661503] {18945}[Hardware Error]:   secondary_bus: 0x00
[94781.661504] {18945}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80a
[94781.661505] {18945}[Hardware Error]:   class_code: 010802
[94781.661506] {18945}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[94781.661534] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661536] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661539] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661545] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661546] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661548] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661553] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661554] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661556] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661561] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661562] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661563] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661568] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661570] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661571] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661576] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661577] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661579] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661584] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661585] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661586] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661591] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661592] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661594] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661599] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661600] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661601] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661606] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661608] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661609] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661614] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661615] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661617] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661621] nvme 0000:43:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661623] nvme 0000:43:00.0:    [ 0] RxErr                  (First)
[94781.661624] nvme 0000:43:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661629] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661630] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661632] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
[94781.661637] nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
[94781.661638] nvme 0000:44:00.0:    [ 0] RxErr                  (First)
[94781.661639] nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID

Are you adding nvme through a carrier board? If yes, which one? Do you still get errors if you drop down to pcie 3.0.

There was a discussion at some point on how not all boards designed for PCIe 3 are good enough at preserving signal integrity at clocks required for PCIe 4.

Yes, using the included ASUS HYPER M.2 card which is supposed to be 4.0 rated.

But also the problem shows up even with only the boot SSD present, in the motherboard slot.

Setting the drives to PCIe 3.0 did get rid of the errors…
Which is suboptimal as ways of getting rid of annoying warnings go.

1 Like

Perhaps trying to put the card into another slot, re-socketing the nvme drives, or re-socketing the CPU might help.

Short of getting another nvme riser / carrier board.

I’m getting these too. It says “slot 0” in the error message (in mine too)… shouldn’t slot 0 be the top slot, usually used for the GPU? I’ve got two Hyper M.2s now and neither are in the slot closest to the CPU.

Thus far, the RAIDs (one per card) appear OK…

Any update on how things have gone since first post? I’m thinking of just letting this one slide I dunno…

@artlav dumb question: did you configure your drives as a RAID array in BIOS? I did that and at least I see no dmesg errors for that array. The errors I see are for another array I’m testing where I didn’t bother (yet) to make it an array in BIOS.

(Not that making it a RAID array in BIOS appears to do anything? At least for me, after I made it an array in BIOS I still just used mdadm to set up software RAID0).

No, they were set as individual disks.

@artlav I’ve got the same issue. You can turn off AER for NVMe disks on the card. This works for me, follow the link:

[Look here](https://gist.github.com/zekome/35db528b33206e68f18439ad7fabfcd5)
1 Like

Did you do it with the kernel parameter method or did you create the .service file?

I had the same issue with Samsung 980 Pro drives on this board on the hyper m.2 card. I ended up replacing the drives with 990 pro drives and the problem went away.

I’m currently talking with Asus support about this though! Perhaps they’ll fix it.

I wrote up my experiences with the board here on my blog: Build log: Threadripper Pro 5975WX Linux workstation on the Asus Pro WS WRX80E-SAGE SE WIFI – Interesting things

Did you got a reply/fix from Asus?

I just build a system using the same board upgrading from Threadripper 3975X.
All hardware works, my regular threadripper board didn’t show these errors. I also used an Asus board here (Zenith II Extreme Alpha).

I found this hack that seems to workaround the issue:
I’m not allowed to post links so here’s the url
gist.github com zekome 35db528b33206e68f18439ad7fabfcd5

1 Like

clickable version of the github gist link

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.