ASRock RomeD8-nl - pcie error

We are in need of some help. Some 3 months ago we started building our servers.
Our setup

  • romed8-nl motherboard 7pcie slots
  • 512 gb ecc ram
  • AMD EPYC 7502 32-Core Processor
  • 7x rtx 3090
  • M.2 1tb

We are currently running 2 versions of ubuntu

  • ubuntu 18.04
  • ubuntu 20.04

All 7 gpu’s are connected through a pcie extender 3.0. Previously we also used a 4.0 in the shared M.2 pcie port. Later thinking this was the cause of the error.
About 3 months later and many more experiments later:

  • Using 6gpus and leaving the shared m.2 port as is
  • Changing the bios x16 to 8x8 on that port
  • Updating grub with pcie-aspm=off pcie=nommconf

But after all this we are still camping with the same issue.
It’s at random and we can’t seem to figure out what is causing this.

any expert in the feeld that is willing to take a look and help us out?


I am unsure if I can directly help you with that Motherboard.

When I had that issue it was directly related to the hardware. The link training wasn’t occurring correctly/dropping out during use and it would drop the GPU from the bus.

Maybe you could expand on this a little, have drilled down on that specific GPU & Motherboard Port?

1 Like

Thanks for the reply!

That’s excatly what’s happening, “GPU had fallen from the bus”.
There is no specific port that gets the error. It is completely at random.

So far we have one test server running the same specs but the only difference is that there are no pcie 3.0 extension cables involved but rather 3 3090’s straight into the ports on the boards.
This one is stable.

My advice is that you should switch the Server with riser issues to Pci-e Gen 3 in the Bios. This was able to stabilize the greater transmit distance when risers are involved. Pci-e 4.0 is very sensitive to this signal loss.

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.