4-month long nightmare

Hello all. I started a build about 4 months ago and, despite spending a ton of $, have yet to get a system that will even POST. It’s not the first computer I’ve built (I’m a computer scientist and my wife is a senior semi engineer). I can’t for the life of me figure out what’s going on. Here’s a quick rundown of what I’ve tried:

  1. Intel w9/ASUS ws790e-sage-se: I ended up buying 2 of these CPUs, 4 of the motherboards, a new be quiet! Dark Power Pro 1600W PSU (I had a Corsair hx1500i), and a bunch of DDR5 ECC RAM from the MB QVL. I was unable to get any combination of any of these components to successfully POST, getting a q-code of 0x92 across the board.

  2. Threadripper PRO 7975WX/ASUS wrx90: I ended up buying two of these MBs and one CPU. One of the MBs wouldn’t even light up (there was no indication that it was even receiving power), and the other failed to POST.

  3. AMD EPYC 9554/ASRock Rack GENOAD8X-2T/BCM: again, I bought two of these motherboards and one CPU. One of them wouldn’t power on at all (even though the BCM heartbeat light was on). With the other one, when you hit the power button, the fans would start spinning and the debug LED would immediately light up 0x00 and not change at all.

Like I said, I’m at a loss. We work on an ESD mat, I read the manuals, abide by QVLs, we check each others work… If anybody here has any ideas or recommendations, we’d really appreciate it. At this point I’m seriously considering giving up.

I should add that, for each of these builds we did the standard troubleshooting stuff (1 DIMM, reseating everything, no GPU, no M.2 NVMe, different PSUs, disconnect & reconnect all cabling, …). We also worked with ASUS support (which was not helpful in the least, and ASRock support, which was helpful but unable to resolve the issues we were having).

Also, the reason we buy multiple CPUs/MBs is to have extras on-hand in case we have issues. I’d rather return unopened components than risk missing my 30-day return window and having no choice but to rely on ASUS support (what a nightmare they are). Anyway, thanks in advance to anyone that might have any ideas or suggestions!

Did you follow the directions in the manual as to what ATX connectors to use?
What do you have in the PCI slots?

2 Likes

Also, can you access the BMC web UI? That should work even with the Motherboard powered off … Did you try a bios update?

2 Likes

This may sound stupid, but I have two crazy theories:

  1. Is there anything IO related connected to the mobo? About a week ago I got my Odroid oneboard pc and plugged an old dusty Dell keyboard to it (because I was too lazy to unplug my main). That ended up in the initial mobo logo staying for quite a few minutes before the screen turned blue, and an eternity later the bios menu started to draw… line by line (0.00001 fps).
  2. Power. Although a few weeks ago I sent my new pc to the service center (it failed to pass the loading screen of win10 specifically on cold boots), I already received a message that my beQuiet Dark Power 13 Pro 1600w (yep, the same as yours it would seem) is the main culprit judging from the initial tests.

2.1. But there’s the possibility that your house’s(or where are you trying to make it work) has somehow junky power. That one I heard from a really old IT guy, who worked in pc building company. They build and test the pc - works well. Deliver it to the customer - next day the customer calls and says that it doesn’t work. They bring it back - HM, it works. After a few tries, they found that the environment was the fault - I think it was the fridge.

2 Likes

Hello MadMatt. Thanks for the reply! I did follow the manual (precisely, I believe). Plus, I assembled and reassembled each of these boards several times (since I had two CPUs & 2 PSUs) - and, I had my wife (& chatGPT), checking my work. Also, I was able to flash the updated BIOS to each of these boards using the quickflash (or whatever it’s called). I did not, however, try accessing the BMC web UI (towards the end there I was rushed to get everything packaged up and returned before my 30 return window closed).

Ok, so, what hardware are you trying to make work now?

Howdy Draaksward! Not stupid, or crazy at all! At this point I’ve had like dozens of high-end server/WS MB/CPU combos completely fail to even post (several of them showed no signs of life at all), so any insights, no matter how unlikely, are probably less unlikely than getting so many defective high end components in a row.

Anyway, in response to your theories:

  1. For the first several builds I assembled & connected everything before powering on (monitor, input devices, GPU, M.2s, …). After the initial couple of failures I stripped it down to a single DIMM, CPU and heatsink, just to get it to POST. Since then, I don’t believe I’ve actually connected any IO (since nothing since has POSTed).
  2. Oof. That sucks… I haven’t actually been able to confirm that my be quiet! works (since I haven’t had a working system with which to test it), but my previous build seemed to work just fine with my Corsair HX1500i. I purchased the be quiet! during the initial round of failures, thinking that it was the common denominator. I wish I’d been more active on the forums at that point… I probably would have bought a Seasonic instead.
    2.1. I was actually thinking of buying a dual conversion online UPS to see if this might be a contributing factor. My only hesitation is that my last computer (which, granted, was older and perhaps not so sensitive), seemed to work just fine. Maybe I’ll just go ahead and remove this potential culprit from the equation.

Thanks for taking the time to help me troubleshoot!

1 Like

Three things i can think of right off the bat is
#1 power setting on the power supply possibly incorrect.
#2 possible short to the case.
And #3 Incompatible memory.

If the power setting is wrong for example set to 220 instead of 110 the power good signal will not be sufficient to keep power supply running and the system will not boot.

Possible short will cause rapid shutdown to prevent damage.
And incompatible memory will throw an error beep and will not boot.

Im sure all these have been checked, so it may be hardware compatibility or bios setting.

If its a short or hardware issue it may not get past the post stage.
If its getting past the post stage but not bios its narrowing the field.

I recommend getting your components from the same vendor, the often configure and run a burn in test to make sure everything works.

I forgot to mention that I initially was installing an RTX4090, but have since been testing a bare-bones build (1 DIMM, CPU & cooler).

I just returned an EPYC 9554 and two ASRock Rack GENOAD8X-2Ts. This was particularly heartbreaking, as it’s a really, really nice MB (at least on paper), and a great CPU. I’m thinking about trying again with these components. When I reached out to ASRock, I was immediately connected w/ remarkably knowledgeable, helpful tech support. With ASUS it’s literally exactly the opposite - you wait for days for a callback, then wait for hours on hold, only to have to explain to someone who sounds like they’re going through airport security at LAX and are clearly having a very hard time following the conversation that yes, in fact you did consult the QVL before purchasing RAM… That was literally the gist of every single conversation I had w/ ASUS tech support: I’d explain the problem in excruciating detail, emphasizing that every component was from the QVL, only to have them suggest that I check the QVL to make sure that my [RAM, M.2, GPU,…] are compatible.

Anyway, yeah… I’m thinking ASRock Rack GENOAD8X-2T/BCM and an EPYC 9004… Unless anybody has any better suggestions.

Hi Gnuuser!

I was actually thinking about purchasing a MB/QS CPU combo from one of the reputable vendors on ebay, after contacting them to confirm that they’re doing a burn in test. If anybody has any recommendations re: reliable ebay vendors I’d be happy to look into them!

The PSUs I have don’t have 220/110 settings (that I know of…), and the Corsair HX1500i was working fine on my 4 year old Threadripper that this WS is replacing.

I’m very careful about MB offset placement, and lately I’ve been building outside the case (Fractal Torrent) altogether, on an ESD mat, so I don’t think it’s a short… Certainly not like dozens of shorts…

W/R/T memory, it’s entirely possible that I have bad RAM, but the RAM I do have was purchased specifically from the QVLs of the MBs I’ve been trying… I have both Kingston and Micron/Crucial DDR5-4800 ECC server RAM, several DIMMs of each.

Iirc, people were having trouble like 6 months ago with these 64GB DIMMs on some of these WS builds… Maybe I should buy a few DIMMS of 32GB non-ECC RAM to see if that makes a difference…

Anyway, thanks for helping me brainstorm!

How certain are you that you are torquing the CPU cooler down properly? Unless you regularly install CPUs and have a good feel for what 5 and 12 inch pounds feels like, for Intel and AMD respectively, it is difficult to do without a decent torque wrench.

With how big these CPUs have gotten there is a very large amount of flex in them and they need to have a very specific force on them in order to seat into the socket and make contact with all the pins.

I can confirm that a HX1500i will work with the ASUS W790 Sage motherboard as long as you connect the one 24 and three 8 pin power cables to the motherboard.

There is no 12v non-ECC memory that can be used in any of these WS/Server platforms. that being said, I was able to get 64GB RDIMMs working in W790 easily and was even able to overclock them a decent amount.

1 Like

Cant speak for the Corsair, but with beQuiet you have a certain pattern for both normal and shorted behavior:

  • a good power on is accompanied by a hardware click from the psu
  • a shorted one will first try to run → short. It turns off, and I think for the next few minutes it will not turn on at all
1 Like

One thing I would try to find on aliexpress or something is service boards. Maybe that would shed some light.

The video is in russian(didnt find an eng one), but the fellow is showing how to test if the socket was soldered correctly

And there are such service boards for ram, m2, pci.
Maybe it would provide some info.

1 Like

This was my thought as well, too much / too little pressure can cause these boards to no longer make proper pin contact. If the same cooler/mounting was used across most of these systems then that would be one potential source of problems.

Finding anything again on youtube is a royal pain, but I believe it was Wendell that had a video I watched showing the precise amount of torque pressure to set the screwdriver at when installing these server CPUs.

1 Like

Does your PSU have that single-to-multi rail switch? I think it needs to be in single rail mode to work with high-power components.

1 Like

I honestly hate this fact. This is past 14 days of me sending my newly bought 13900k system (without gpu and drives) to the service center. When the problem first started I stressed about “maybe I didn’t properly screw the custom cpu frame” and almost 3 weeks of similar “something small and stupid” just to cross out every possible thing (with the fun fact that the stock frame had only 1-2C difference… and this is with a different thermal paste - Thermal Grizzly → Noctua)…

1 Like

Usefully threadripper CPUs come with a torque wrench that’s explicitly designed to be used to torque the CPU down; you do the screws in the order noted on the socket softly then a second round to seat them basically all the way there and a third round until the wrench clicks at you. It’s an extra suggestion/recommendation to torque one of the screws down without the CPU installed just to verify the supplied wrench does click, but assuming you get one that’s not faulty it’ll work perfectly for installing the CPU, no extra wrench required.

But yes, if you don’t do this then you’ve not installed the CPU correctly, and insufficient contact will result in a non-booting TRX/EPYC platform at best.

2 Likes

I’ve always used the method described for torqueing metal screws/nuts on sockets; but it didn’t work for the plastic nuts that the big intel sockets use, it felt like I was going to break them when initially trying to tighten them; the torque wrench was the only thing that gave me enough confidence to continue tightening them.

I didn’t think that the TR7000 CPUs or their coolers came with torque wrenches anymore? I though it was only the very earliest of the old threadrippers that did.

1 Like

So this guy unboxes a 7000-series TRX and it has the orange wrench (typical of TRX; EPYC is a grey/black one, IIRC).

My 3960X came with one too, and I’ve found images for the 5000-series coming with them too, so I’m going to wager they still ship with them, mostly because of the torque required!

3 Likes

That is good to know! I’m not sure why I thought they no longer came with one.
I can confirm the boxed Intel SPR-WS CPUs don’t come with a torque wrench (even though they are the ones that need them even more due to them being more delicate).

1 Like