YATPRO : Yet Another Threadripper Pro Build!

Got yet another new failure mode : disabled the BMC but kept the VGA since this board hates the 970, and now my display just has an unblinking cursor on it. Numlock / Capslock show that the processor clearly isn’t stuck or anything. The 7-segment display on the board is stuck on “02” which tells me sweet f***-all (“AP initialization before microcode loading”)

Starting to think this board is defective.

Or try to reseat the CPU

Indeed, I got familiar with that problem while using an EPYC motherboard in a workstation. I’ve tried disabling it, but that doesn’t help.

Returning this board is going to be a pain. Due to the chip shortage I had to get it from Germany (I’m in France). And I do need a working motherboard. And it’s not like anyone has the Asrock Creator in stock…

This is not gonna be a pleasant week-end over here, I can tell already :cry:

The blinking cursor with vga disabled is somewaht normal. Understand the onnoard vga isnt reallu onboard the main pc, but instead is the console of the management controller that may or may not pass through to the host.

Try putting the gpu in the slot closest the cpu.

Whats the memory kit? Id try just one dimm with onnoard vga enabled no gpu then work up from there.

All 8 pin power connected?

I would futher hard set the pcie slot mode to gen3 or gen2 in bios with onboard vga, then add the addin gpu.

1 Like

Hi Wendell, glad you could drop in ! If only I had a few Pentium Pros to sacrifice so that I can summon you through my server rack :grin:

I’ve connected all power supply inputs to the motherboard. The PSU is a Corsair HX1500i, it has enough PCIe outputs I could route individual cables from PSU to motherboard and GPU.

The RAM I’m using is a pair of Crucial DDR4-2666 sticks. They are known-good (they come from an old NAS of mine). 2 x 16 GB, non-ECC. I had to cannibalize. This new machine will use Samsung 64 GB DDR4-3200 ECC sticks that are on the board’s QVL, unfortunately I won’t receive them for at least a week. But I’m confident the RAM isn’t the problem : I did manage to install Windows and run some stress tests already.

I tried the second closest slot to the CPU. This motherboard is a bit cramped around the CPU socket, I was afraid I’d have a really hard time removing the GPU from the first slot. The system behaved exactly the same. I was afraid the PCIe redrivers were to blame but this doesn’t appear to be the cause.

On your suggestion I forced the GPU’s slot to PCIe 3.0, that didn’t change anything either.

For sh*ts and giggles I even tried a Radeon X1900XTX from my personal museum. Didn’t work either. I’m trying to locate a Sunix PCIe x1 GPU I have somewhere, it’s a generic graphics cards for servers, but it’s so tiny it got lost.

Regarding the cursor thing : it doesn’t blink. Also, it’s with VGA enabled and BMC disabled. The Asus board has separate switches to disable the VGA and BMC. I’ve tried all four combinations :

  • BMC and VGA disabled : no joy, no signal ever comes out of the GeForce.
  • BMC disabled / VGA enabled : several different outcomes in the range “nothing on any display” to “everything works”. Doesn’t appear deterministic.
  • BMC enabled / VGA disabled : nothing out of the GeForce.
  • BMC and VGA enabled : seems to always work on the VGA, but GeForce drops out after a reboot, with Windows showing me this :

Err43-1

Googling “error 43” is yielding a ton of different things. It’s clearly something that can happen to PCIe cards. I just wish programmers everywhere were taught that it’s perfectly OK to use “printf” to display plain-English error messages that other people can understand, instead of cryptic numbers. As they say, “there should be a law”.

EDIT : after yet another GPU drop-out Windows gave me a little bit more cryptic data to Google :

is typically down to having the wrong driver installed.
download the latest gtx 970 driver from nvidia (just install the driver and physx).

for windows 10/11 64bit.

download and use ddu from guru3d

after cleaning with ddu (only uninstall nvidia drivers) reboot and install the freshly downloaded nvidia display driver.

if you get error 43 after that then the card has an issue.

lastly make sure your in peg 1 mode in bios/uefi if you have the gpu in pci-e slot 1…
or peg 2 for slot 2 and so on.

I bet that 970 is missing a firmware update. Googlr displayport 1.4 firmware and see?
Maybe also force pcie2

When you change bmc vga mode you must kill power to the psu for 1 min then plug it in and wait 1 min before powering on the system.

Csm may also be needed as idk thst care is real modern efi

I did download the latest drivers for the 970, since I knew that’s what I was going to use initially. And this 970 has been with me for years, in fact it was working just fine in another PC the day before, no reason to doubt it.

I’m still experimenting, and I’ve finally noticed a pattern. It’s a bit strange but it is deterministic. At this moment I’m using two monitors :

  • A VGA monitor plugged into the on-board graphics
  • An HDMI monitor plugged into the GTX 970 through a KVM switch.

If I start the PC with the KVM switched to another computer, then the PC boots normally and the 970 works. I can switch the KVM to this PC and everything’s fine.

If I start this PC with the KVM switched to this PC, then only the VGA display works and when I get to Windows, I can see the Code 43 error in the Device Manager.

It’s as if I can’t have a monitor connected to the 970 prior to booting. No idea what mechanism is involved, but it’s what I’m observing. Real weird.

I could test this also tomorrow on my GTX 970, if I have the same behaviour.
I don’t have any monitor plugged in, because I only use it via looking-glass.
BMC VGA is plugged into an old Eizo Flexscan S2100 and the RTX A5000 is plugged into a Samsung G7

1 Like

That would be much appreciated ! Meanwhile, I’ve kept investigating, and it is clear that the “trigger” for the odd behavior is having a monitor connected to the 970 when I start the computer. Here’s the sequence I’ve just tried. “VGA” means on-board graphics, “HDMI” means GTX 970 :

  • Turn the PC off and on at the wall
  • VGA and HDMI both connected : PC starts on VGA only, 970 is “Code 43”.
  • Shutdown, unplug HDMI, turn back on : PC starts on VGA, but 970 is fine. Plug the HDMI and both monitors work.
  • Disconnect HDMI, reboot PC, reconnect HDMI afterwards : the 970 works.
  • Keep HDMI connected, hibernate PC, restart PC : everything works, but it feels like it took longer to come out of hibernation than to boot.
  • Keep HDMI connected, hibernate again, turn off power at the wall, restart PC : everything works, but it seems to take longer than booting the PC again. It’s only seconds, so I could be imagining it.
  • Keep HDMI connected, reboot the PC : the PC boots on VGA only, the 970 is “Code 43”.
  • Booting on HDMI alone does not work : the PC boots on VGA and the 970 goes “Code 43”.

So it’s very clear that there’s something this configuration doesn’t like about a GTX 970 being plugged into an HDMI monitor during boot. I’ve tried half a dozen times, same results I’ve just listed.

Note that when I wake up from hibernation, which is the only case where the PC boots Windows with the 970 connected, I do not see the Asus splash-screen and the invite to enter the BIOS.

When hibernating, a PC goes into ACPI state S4. For some reason, waking up from S4 makes the board “tolerant” to the GTX 970 feeding a monitor, whereas during boot (sort of waking up from state S5) this is a deal-breaker.

Quick note : my 970 is a Gigabyte Windforce, I never updated its BIOS. It works very well on X58 and X99 motherboards, as well as an Asrock EPYC server motherboard. I’m hoping its BIOS is somehow causing this issue, because it’s clearly a software problem.

This is the weirdest boot behavior I’ve ever seen from a PC, going back to the late 80’s.

It’s getting really late here. Tomorrow I’ll experiment with an RTX 3090, see if that works better.

As a matter of fact, I never did update it. I’ll try an RTX 3090 tomorrow, if it works better then it might be the reason. But I used that 970, as is, on an Asrock Rack ROMED8-2T motherboard last year and had no problem whatsoever. It was even in the same case, with the same PCIe riser. I didn’t have to force the PCIe slot speed.

I’m a bit annoyed that I don’t know exactly how the Asus board is architected. Every other WRX80 board manual has a schematic telling how everything is connected to the processor and chipset. I wish Asus did the same.

I’m afraid I haven’t been that patient. I only waited until the LED’s on the board went dark, usually a sign that all capacitors have discharged below VDDmin. But I’ll keep it in mind.

It’s a brand new day and I have news, some good, some dodgy.

Using a 3090 instead of a 970, I get better results. I can finally boot reliably (so far) using only the discrete graphics. I’ve disabled both the onboard VGA and the BMC. However I have to force the PCI slot to PCIe 3.0 : if I leave it on “auto”, the strangest things happened.

For example, I managed to boot to Windows only for the device manager to give me the same “Code 43” error as before. According to Windows it had to stop the 3090… and it was displaying that message on the 3090 itself. Kind of a “mission failed successfully” moment :thinking:

Clearly, there’s some signal integrity issue on this board, running a PCIe riser on the slot farthest from the CPU. You would think the board’s redrivers exist specifically to compensate for that, but maybe Asus doesn’t know how to use them.

For a comparison, the Asrock ROMED8-2T had no trouble in this same configuration. Same riser, same 3090, using the last slot, it was happily running in PCIe 4.0 all day long… and it has no redrivers on the PCIe slots. The only redrivers on the Asrock are where you’d expect : near every connector that can bring PCIe lanes to cables, for example the Oculink and U.2 connectors.

Speaking as an electrical engineer and board designer, it’s painfully obvious that Asus is inferior to Asrock, at least on this type of high-end hardware. For example, the steel plate “reinforcing” the motherboard has everything to do with Asus using a cheaper PCB than Asrock. Asrock’s looked like a 14-layer, very thick and rigid. Asus’ looks like an 8-layer. I didn’t break out the caliper but it really looks like 1.6 mm thickness. It also explains why Asrock can make this kind of hardware fit on an ATX-size board while Asus needs to go 5 mm bigger than full E-ATX.

The BIOS contains interesting options for manually configuring the redrivers for slots 5, 6 and 7. I’m going to put my engineering degrees to use, find the datasheet for those redrivers and see if I can go all Thanos and do it myself. I paid for PCIe 4.0, not PCIe 3.0.

This being Europe, I have a couple of weeks to decide whether I want to keep this board or return it for a refund. Not gonna lie, if I can get my hands on an Asrock Creator (rev 1) then I’m sending the Asus back.

1 Like

This schematic was posted here some time back. Might help with your investigation?

3 Likes

Thanks, nnunn ! However, this isn’t as detailed as one would hope. For example, this is what Asrock gives you for their latest WRX80 board :

Anyway, I’m just rambling. I wouldn’t be happy unless Asus gave me the CAD files for their motherboards.

2 Likes

Earlier today I received an NVMe carrier. This led to further investigation of PCIe on the Asus WRX80 :

Note that this came with a heatsink covering all four positions, but for quick testing it was simpler to leave it off. You might be wondering why I got this puppy when the Asus motherboard ships with a similar gadget. Two reasons : the Asus twice as large for no valid reason, and I had originally ordered the Asrock motherboard, which doesn’t come with one.

The two drives I’ve tested with are PCIe 4.0 (Sabrent) and PCIe 3.0 (Samsung). I wanted to test if, after bifurcating an x16 slot, you could mix devices with different bus speeds.

Short answer ? Nyes.

It works, in that the drives are recognized by the BIOS and Windows. And then you start transfering some files and one (or maybe both) drives hang, forcing you to hard-reset the PC.

Conclusion : it doesn’t seem like a good idea to mix M.2 drives of different PCIe generations on the same x16 slot.

So I removed the Samsung and continued with just the PCIe 4.0 drive.

I also wanted to test if an NVMe drive would struggle to run in PCIe 4.0 if it was plugged in one of the last slots of the board. I saw no such problem there : the Sabrent behaves the same whether it’s slot 2 or slot 6 :


Incidentally : I’m not an expert in SSD testing… can someone tell me why the CrystalDiskMark speeds (which I also checked in the task manager) are so much faster than what I see when I copy large ISO files ? The best I can get out of the file explorer is 2.5 GB/s, transferring four files of 5-6 GB each. My system drive is an even faster SN850X.

1 Like

Great to see you have progress, despite the setbacks!

It’s likely related to queue depth (number of IO commands lined up at a time) and thread count (number of parallel operations). I believe copying an ISO file to or from the drive should be the equivalent of what CrystalDiskMark calls SEQ1M Q1T1. I see you have it there, does it give comparable enough numbers?

It’s possible that Windows Explorer somehow splits the file in smaller chunks than 1M when copying sequentially, though.

1 Like

Hello there ! Yes, the numbers I see when I copy files are close to SEQ1M Q1T1. Your explanation makes sense.

Let’s call that “growing pains”. As usual with this type of motherboard, there’s a learning curve and some effort required to get the most out of it. I’m not giving up. I’m digging deep into the BIOS and looking at PCIe redriver settings. I’m not the only one having issues with PCIe 4.0 on this board, using risers, in fact someone just created an account and started a thread with that specific problem !

Help with WRX80E-Sage SE Render server

I’m going to post my research on this topic in their thread, I feel it’s something a few Asus users will be interested in.

EDIT : it’s been a few evenings of research and experimentation and the result is that my 3090 now works in PCIe 4.0 all the way to the end of the motherboard and riser. A lot went into getting this result, you may want to check it out as I’ve added a lot of background info that could help you no matter what platform you’re having PCIe trouble with. END OF EDIT

That being said, even in its incomplete state (still missing its actual RAM) this machine is very promising.

2 Likes

F Yeah ! The first 256 GB of RAM have arrived, just in time for a three-day weekend !

Samsung even added a touch of humor by printing a UKCA logo on their sticks :rofl:

EDIT : it’s been a few days. My silence can be taken as an indication that this new machine is working really well. Nothing quite like the feeling of a brand new, much more powerful workstation ! And it has a lot of room for growth.

Now I need to spend days, maybe weeks, migrating my many different tools, projects and workloads to this new monster. I’ll be running the old and new machines side by side for a while.

I’m still waiting on fun components, notably a 6-port Gigabit Ethernet NIC based on two Intel i350 and a PCI x8 bridge. Why I’d need so much Ethernet is a long story.

3 Likes

The build continues. I’m holding off a little on buying the final 256 GB of RAM as the price of 64 GB sticks is currently dropping a little. This is a 10 K€ machine but hey, if I can save 100 € I’m not going to say no.

Speaking of savings, Black Friday on Amazon was good. I managed to score two 4 TB SN850X SSD’s for 365 € each, which is half-price. So now all three M.2 slots on the motherboard carry the same drive, 12 TB total. I’m not going to run out of space for a little while.

The 6-port NIC just arrived today. It’s a funny thing :

Cost me 100 € on eBay. The “catch” is that it has this weird non-standard PCI bracket. It does fit in a normal PCIe slot but you can’t screw it in place. I’ve zip-tied it for now while I brainstorm how to fabricate a suitable bracket.

This card words right away, no drivers needed, it is seen as 6 Intel i350 gigabit Ethernet ports and has SR-IOV support. Here’s the product page if you’re interested :

There is no kill like overkill :

This Asus motherboard wasn’t my first choice but it has grown on me. Three M.2 drives, a 3090, 8 Ethernet ports in total, and I still have 5 spare PCIe x16 slots. It’s also pretty quiet.

3 Likes

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.