Threadripper & PCIe Bus Errors

additional info: going back to PCI-e 2.0 in BIOS seems to solve the issue (but of course that comes of a performance loss that leaves me with an under-used 1080Ti :frowning: )

have you also tried to limit PCIe to 2.0 instead of 3.0 in BIOS?

The only reason this works is because PCIe 2.0 doesn’t support ASPM, the better option is to provide the pcie_aspm=off flag. I am not sure why it’s not working for you with Ubuntu though, I am running on a custom 4.15 kernel in Debian and this is working fine for me.

Well I suspected something like that (that it works only because aspm is not an issue in PCEe 2.0). However, that doesn’t explain why I’m still having the problem with PCIe 3.0 and “pcie_aspm=off” in grub. Unless there is some difference in the way grub is parsed between Ubuntu 17.10 and 18.04… or I’m missing something else…

“pcie_aspm=off” works fine for me in Ubuntu 18.04. Running a 1950X and Zenith Extreme. Maybe a dumb question, but you did rebuild your grub config after making the changes right? (update-grub).

Posting this here too as it is pertinent to this thread.

Edit: I logged out so I could atleast follow the forum and read what was being discussed. About an hour later I was banned by IP address. Well done Anandtech, so “Professional”. What a joke of a company.

2 Likes

I’m not at all surprised. Anandtech has always been known for a bit of shilling and scumbaggery.

I managed to get in touch with a “Forum Director” by creating another account via a VPN.

Your brand-new account seemed to have the makings of spam.
It was littered with hot links to other websites

The rule of asking questions was such that
“No tech support questions, as these require in-depth personal follow-up and diagnostics”
You actually asked three tech support questions, so the post was removed.

  1. Obviously nobody bothered to look at the post and see what the links were. One was a link to this thread, another to patchwork.kernel.org, one to the the NPT bug & fix, and LookingGlass.
  2. I did not ask for tech support, I asked for a status update on the problems AMD is well aware of but seems to be keeping silent on.
  3. Apparently I broke the rules by creating this additional account simply so I could ask why I was banned.

The AMA rules state as said above “No tech support questions, as these require in-depth personal follow-up and diagnostics”, but right below that it states:

What: Ask me Anything - AMD

So when Anandtech hosts an AMA, on a technical/computer forum, about technical stuff, you better not ask technical stuff, you will get an insta ban with no warning.

6 Likes

This fixed the problem for me too on my x370 taichi.

Im very dismayed that this has barely had any attention from amd or tech web sites.

this has been pretty much swept under the rug and ignored as a non issue.

however its not a non issue as it appears to be a hardware issue of some sort or worse a an issue with the cpus them selves.

but who knows, cause no one can seem to get an answer as to why it occurs. or when we will see a fix.

just simply frustrating.

Kinda? The aspm disable option has been 100% stable for me since always. I suspect it’s actually an asmedia/chipset issue…

Some errors are resolved on tr as of agesa 1005

not sure when this was published but the errata sheets for the ryzen processor was released.

1080 PCIe Link Exit to L0 in Gen1 Mode May Incorrectly Trigger NAKs, (on page 54) also
1083 on page 56
sounds like what may be causing the issue

unfortunately no fix is planned

good thing its not serious … still dont like seeing it especially when trying to trouble shoot something .

this is the errata pdf for those curious

1 Like

New developments are coming ahead of the Threadripper 2 launch. BIOSes are being pushed for Threadripper 2 compatibility which are said to include out of the box fixes for these issues

https://www.reddit.com/r/Amd/comments/7gp1z7/threadripper_kvm_gpu_passthru_testers_needed/ (Scroll down to Update 8)

Most board vendors are now pushing out official (non-BETA) BIOS updates with AGESA “ThreadRipperPI-SP3r2 1.1.0.0” including the proper fix for this issue. After updating you no longer need to use any of the temporary fixes from this thread. The BIOS updates comes as part of the preparations for supporting the Threadripper 2 CPUs which are due to be released in a few weeks from now.

2 Likes

I gave this a try on my new build and it results in a lot of issues. Tons of systemd-udevd timeouts. Posted about it in another thread:

I also still get the pcie errors from the start of this thread even on a slightly older bios. Still not entirely sure what the fix is.

Add pcie_aspm=off to your linux kernel line in grub?

1 Like

I can confirm that AGESA 1.1.0.0 fixes the PCI FLR reset bug on my system.

4 Likes

The boot line seems to do the trick I must have messed it up before.

Has anyone seen the gigabyte board fail to restart with boot code “0d” on the readout? Restarting seems to be hit or miss. The newer bios didn’t seem to exhibit the behavior but then I was plagued by the systemd-udevd issues.

So, it took a little while, but also here I’m not happy to report that the PCIe errors have disappeared on the Gigabyte Aorus X399 after updating the BIOS to F10 (the one preparing for second generation Threadripper :))

Yay!

1 Like

The PCI errors are not what this BIOS fixes, it corrects the bus reset problem. The PCI errors are a known eratta and are harmless.

See https://support.amd.com/TechDocs/55449_Fam_17h_M_00h-0Fh_Rev_Guide.pdf

Simply disable ASPM to avoid the spurious error reports when running in Gen3 (pcie_aspm=off)

Okay I jumped back to the F10 bios update for gigabyte. There were no options to disable PSP or the PSP mailbox in the bios, at least on the designare motherboard.

I was able to compile a customer kernel with CONFIG_CRYPTO_DEV_SP_PSP=n set. So with that config option set and pcie_aspm=off things are looking much much better on my build.