YATPRO : Yet Another Threadripper Pro Build!

Hello guys. I’m officially embarking on the build of my new workstation, which will be based on a 5955WX. I’ve ordered two motherboards (long story) and as the first one has just shipped, I think it’s the perfect time to start this thread. And maybe ask some questions I should have pondered beforehand.

Right this second, the most important question on my mind is : how few DIMM’s can I use with this processor and be able to boot ?

My initial plan was to buy all at once one motherboard, a processor, and four 64 GB RDIMM’s, to be followed a month later by four more DIMM’s. Let’s just say I don’t lack cash, but my VISA is a bit of a bottleneck right now.

That plan flew out the window as I was faced with the inability to find my choice motherboard (Asrock WRX80 Creator) in stock anywhere. In despair, I ordered one from Amazon with a long lead time. Then I found the Asus SAGE in stock in a German store and ordered it today.

As “luck” would have it, soon after I was done ordering the Asus, Amazon shipped the Asrock. Because of course :expressionless:. Normally, I’d cancel the Asus but… given how hard it was to get even one WRX80 board, I’ve decided to keep both. I might refund or sell the one I don’t end-up using. Or I might go crazy and build two machines.

Anyway… the old VISA might be a little too tight this month to buy two high-end motherboards AND four 64 GB DIMM’s. And the Asrock’s manual says I can run that board with just one or two DIMM’s.

However (and I don’t recall where) I think I’ve heard it said that TR Pro’s won’t run with less than 4 DIMM’s. On the face of it I don’t see why that would be a problem (other than on the performance front).

Has anyone tried to boot one of those machines with just one or two DIMM’s ?

If I recall correctly, I’ve seen videos that have them boot with 1Dimm while troubleshooting, so it should be possible.

1 Like

Well, I’ve ripped out a pair of 16 GB DDR4 sticks from a spare PC. It’s not ECC and it’s definitely not much capacity but it should at least get me to the Windows desktop :grin:

On a side-note : I won’t be getting two motherboards. The saboteurs at GLS managed to lose the parcel that allegedly contained the Asrock. I say “allegedly” because it turned out the Amazon Marketplace shop I ordered from is rather dodgy. Luckily, Amazon got my back and managed to make them refund me… whereas I couldn’t even get them on the phone. Say what you will about Jeff Bezos (and I usually do) but at least he seems to understand that keeping customers happy is good business practice.

Now I just have to see if UPS will lose my Asus motherboard. I’ve got a bad feeling about it : they added fine print to the tracking page saying that “LTL, Less-Than-Truckload” transportation was handled by a different company for which UPS assumes no liability.

So this is where we are in 2022, guys… none of the formerly-reputable delivery company are worth squat anymore. I guess the next step is to hire Jason Statham : if Hollywood has taught me anything it’s that he’s a very good transporter.

At least I got my processor and a 4TB SN850X SSD. They arrived just fine because they were sold and delivered by, you guessed it, Amazon.

Good luck with the build!
From personal experience DHL is really good and fast, but expensive.
I don’t have any experience with amazon, but I know they have good support should you need them.

Well, UPS or whoever they subcontracted to did almost screw-up. I had to go pick my parcel myself to one of their drop points because they are apparently unfamiliar with the concept of a “concierge”. But I did get my precious. And here it is, a couple hours later :

This is just a “bring-up” setup to verify that no component was dead on arrival. A few parts were borrowed or salvaged from older machines, hence the peasant-level two sticks of 16 GB DDR4-2666, the old trusty GTX 970 and the Noctua heatsink that was initially meant for a much cooler 120 W TDP EPYC. Also, this is my excuse for barely trying to cable-manage.

I did a CPU stress-test with CPU-Z, this heatsink is not capable of handling 280 W. Temperature rose from 32 to 70 °C within seconds, and then kept slowly creeping up. Clearly it was saturated. An AIO will replace it as soon as find the right one.

ECC RAM sticks are on the way, I’ll be running with 512 GB. The GPU will also be replaced with a 3090. And all those lovely PCI slots will receive all sorts of goodies. I managed to find a 6-port Intel gigabit Ethernet NIC, that’s gonna be a lot of fun.

Most importantly (for me at least) this motherboard supports ACPI S4 and so it’s possible to suspend to disk (hibernate).

One thing I failed to realize, however, is that this motherboard doesn’t have a dedicated management NIC. Oh well, can’t have everything.

The power supply cables are rather bothering me. I’m seriously considering making my own.

4 Likes

Looking good!

I also have a GTX 970 mounted at the same spot, which I use for GPU passthrough. I have an RTX A5000 in Slot 7 and routed the riser cable for the GTX 970 around the RTX A5000 (since its a blower style GPU, it won’t affect cooling), so I don’t lose PCIe slots due to 2 slots cards.

I solved the power connection, via a power extension cables for now, but I also got a crimping tool so that I could do my own cables, but so far I only did some cables for fans.

Does anyone know what kind of cable type/brand is good, that isn’t so stiff?

1 Like

First, let me say I’ll give you a long answer about cables and crimping separately. I’m an electrical engineer, it’s part of my work, and it’s quite the rabbit hole when you want to make cables that won’t burn down your PC. (Yes, the recent fiasco with RTX4090 ATX 3.0 connectors had me laughing and face-palming quite a lot)

I’m very interested in how you configured your machine. You see, in my case the GPU only works half the time. When I first power the machine, Windows sees it, no problem. But if I reboot, Windows gives me an “error 43”, telling me it had to stop the PCI device. Power-cycle the machine and the same thing happens : GPU works, then reboot kills it. I’m in the BIOS right now trying to understand what’s happening.

I’ve got the same behavior on the first two PCI slots (counting from the edge). Because it was easy to do, I plugged the 970 straight into a slot instead of using the riser, and I still get that behavior, so it’s not the riser.

Google led me to a script that’s supposed to fix the problem in Windows. It does seem like it could be an OS bug but I’ve never seen it before. If you have any clue, please do share :sob:

Also, I can’t get into the BMC web interface. I’ve got the right IP address but it does nothing and 3 minutes later my browser returns a “site can’t be reached” error. Why oh why couldn’t they add a dedicated NIC for the BMC ? How do I log into this thing ?

EDIT : turns out it’s complicated, due to the BMC sharing the X550’s Ethernet ports with the host, and also the BMC being slower than a Z80 trying to run Windows 3.11. But if you wait long enough, eventually you’ll see two IP’s pop-up on your router’s DHCP clients list. One is the host, one is the BMC. And to access the BMC you can then use https//:443 either on the host itself or a remote computer. It’s a sluggish experience, I hate it, and it offers no worthwhile functionality to a desktop user. I’d have preferred a better BIOS instead.

Those issues are the reason why I wanted the Asrock in the first place. Every high-end board I ever got from Asus has had arcane issues and mystical behavior documented nowhere. I’m sure I’ll sort it all out eventually but this wastes so much time I could spend working or chatting with you guys :grin:

Did you try to put https:// in front of the BMC IP? I think I had the same problem connecting to the BMC.

For the GTX 970 I set the PCIe version to 3.0.
I would tomorrow have to check what bios settings I’ve set. I’m currently recording some movies.

I have the following PCIe slots populated (from CPU to edge of board).
Slot 1) Mellanox ConnectX-4 LX
Slot 2) Asus HyperX m.2 card
Slot 3) Adaptec/Microchip HBA Ultra 1200-32i

Slot 6) Riser to GTX 970
Slot 7) RTX A5000

I run the latest Bios/BMC Firmware
My main OS is debian 11.x bullseye with backports enabled. I never had the problems with USB and SATA ports not visible under linux as other users reported.
The RTX A5000 I use for linux.
The GTX 970 is passed through via qemu/virt-manager to a winows 10 VM via looking-glass setup.

GRUB_CMDLINE_LINUX_DEFAULT=“quiet nomodeset delayacct fbcon=map:000001 console=ttyS0,115200n8 console=tty1 video=card1-VGA-1:e video=card0-DP-1:e video=card0-DP-2:D”
GRUB_CMDLINE_LINUX=“iommu=pt rd.driver.pre=vfio_pci vfio-pci.ids=10de:13c2,10de:0fbb pcie_aspm=off pci=noaer”

Kernel Version 5.16.12

I’ve accessed the BMC, that didn’t help me much. The alarms it was complaining about were all related to fan speeds because Asus doesn’t have the brain power to imagine that desktop machines do not use server fans.

I’m really starting to hate this board. It’s bringing back flashbacks of a similar out-of-box experience with my Asus X99-Deluxe. I would really have enjoyed the Asrock more.

The issues with the GPU are getting less and less deterministic. Sometimes it’ll work past a reboot, sometimes the machine won’t boot at all, sometimes I only have the BMC’s graphics, and in that case, either it’ll be at the resolution I set previously, or it’ll be in SVGA.

I don’t understand how anyone with any amount of self-respects puts a product like this out in the world.

I’ve actually put some server fans into my case to cool the mellanox and adaptec HBA card, but I only run them at 3% PWM :slight_smile:

I think @wendell once mentioned in a video that sometimes the BMC onboard VGA can cause issues with other GPU card. You could try to set the VGA switch on the MB to off.
But I guess in your case it’s maybe best to return the board since I don’t think you will be friends with each others any time soon

Got yet another new failure mode : disabled the BMC but kept the VGA since this board hates the 970, and now my display just has an unblinking cursor on it. Numlock / Capslock show that the processor clearly isn’t stuck or anything. The 7-segment display on the board is stuck on “02” which tells me sweet f***-all (“AP initialization before microcode loading”)

Starting to think this board is defective.

Or try to reseat the CPU

Indeed, I got familiar with that problem while using an EPYC motherboard in a workstation. I’ve tried disabling it, but that doesn’t help.

Returning this board is going to be a pain. Due to the chip shortage I had to get it from Germany (I’m in France). And I do need a working motherboard. And it’s not like anyone has the Asrock Creator in stock…

This is not gonna be a pleasant week-end over here, I can tell already :cry:

The blinking cursor with vga disabled is somewaht normal. Understand the onnoard vga isnt reallu onboard the main pc, but instead is the console of the management controller that may or may not pass through to the host.

Try putting the gpu in the slot closest the cpu.

Whats the memory kit? Id try just one dimm with onnoard vga enabled no gpu then work up from there.

All 8 pin power connected?

I would futher hard set the pcie slot mode to gen3 or gen2 in bios with onboard vga, then add the addin gpu.

1 Like

Hi Wendell, glad you could drop in ! If only I had a few Pentium Pros to sacrifice so that I can summon you through my server rack :grin:

I’ve connected all power supply inputs to the motherboard. The PSU is a Corsair HX1500i, it has enough PCIe outputs I could route individual cables from PSU to motherboard and GPU.

The RAM I’m using is a pair of Crucial DDR4-2666 sticks. They are known-good (they come from an old NAS of mine). 2 x 16 GB, non-ECC. I had to cannibalize. This new machine will use Samsung 64 GB DDR4-3200 ECC sticks that are on the board’s QVL, unfortunately I won’t receive them for at least a week. But I’m confident the RAM isn’t the problem : I did manage to install Windows and run some stress tests already.

I tried the second closest slot to the CPU. This motherboard is a bit cramped around the CPU socket, I was afraid I’d have a really hard time removing the GPU from the first slot. The system behaved exactly the same. I was afraid the PCIe redrivers were to blame but this doesn’t appear to be the cause.

On your suggestion I forced the GPU’s slot to PCIe 3.0, that didn’t change anything either.

For sh*ts and giggles I even tried a Radeon X1900XTX from my personal museum. Didn’t work either. I’m trying to locate a Sunix PCIe x1 GPU I have somewhere, it’s a generic graphics cards for servers, but it’s so tiny it got lost.

Regarding the cursor thing : it doesn’t blink. Also, it’s with VGA enabled and BMC disabled. The Asus board has separate switches to disable the VGA and BMC. I’ve tried all four combinations :

  • BMC and VGA disabled : no joy, no signal ever comes out of the GeForce.
  • BMC disabled / VGA enabled : several different outcomes in the range “nothing on any display” to “everything works”. Doesn’t appear deterministic.
  • BMC enabled / VGA disabled : nothing out of the GeForce.
  • BMC and VGA enabled : seems to always work on the VGA, but GeForce drops out after a reboot, with Windows showing me this :

Err43-1

Googling “error 43” is yielding a ton of different things. It’s clearly something that can happen to PCIe cards. I just wish programmers everywhere were taught that it’s perfectly OK to use “printf” to display plain-English error messages that other people can understand, instead of cryptic numbers. As they say, “there should be a law”.

EDIT : after yet another GPU drop-out Windows gave me a little bit more cryptic data to Google :

is typically down to having the wrong driver installed.
download the latest gtx 970 driver from nvidia (just install the driver and physx).

for windows 10/11 64bit.

download and use ddu from guru3d

after cleaning with ddu (only uninstall nvidia drivers) reboot and install the freshly downloaded nvidia display driver.

if you get error 43 after that then the card has an issue.

lastly make sure your in peg 1 mode in bios/uefi if you have the gpu in pci-e slot 1…
or peg 2 for slot 2 and so on.

I bet that 970 is missing a firmware update. Googlr displayport 1.4 firmware and see?
Maybe also force pcie2

When you change bmc vga mode you must kill power to the psu for 1 min then plug it in and wait 1 min before powering on the system.

Csm may also be needed as idk thst care is real modern efi

I did download the latest drivers for the 970, since I knew that’s what I was going to use initially. And this 970 has been with me for years, in fact it was working just fine in another PC the day before, no reason to doubt it.

I’m still experimenting, and I’ve finally noticed a pattern. It’s a bit strange but it is deterministic. At this moment I’m using two monitors :

  • A VGA monitor plugged into the on-board graphics
  • An HDMI monitor plugged into the GTX 970 through a KVM switch.

If I start the PC with the KVM switched to another computer, then the PC boots normally and the 970 works. I can switch the KVM to this PC and everything’s fine.

If I start this PC with the KVM switched to this PC, then only the VGA display works and when I get to Windows, I can see the Code 43 error in the Device Manager.

It’s as if I can’t have a monitor connected to the 970 prior to booting. No idea what mechanism is involved, but it’s what I’m observing. Real weird.

I could test this also tomorrow on my GTX 970, if I have the same behaviour.
I don’t have any monitor plugged in, because I only use it via looking-glass.
BMC VGA is plugged into an old Eizo Flexscan S2100 and the RTX A5000 is plugged into a Samsung G7

1 Like