ASRock EPYCD8-2T not powering on

Hello all,

I am in the process of rebuilding my NAS with some sweet new EPYC hardware since the lower end stuff is getting relatively affordable and I yearn for PCI-E lanes. Of course, I ran into a massive issue pretty much instantly. I am fairly sure I know what the issue is, a dead motherboard that was sold as new but was actually used and fried by the previous owner, but I want to repeat my tale to people more experienced with EPYC hardware than I just in case I am being an idiot and overlooking something obvious.

Here’s my specs:

CPU: EPYC 7252 8-core
CPU heatsink: Noctua NH-U14S TR4-SP3
mobo: ASRock EPYCD8-2T
RAM: 4x Micron 32GB 3200MHz DDR4 ECC RDIMM
PSU: Seasonic Prime TX-850
Case: An oldie that I love dearly, a Lian-LI PC-A70B

So today I finally had all of the hardware in hand. My plan was to just assemble the bare minimum to get to the IPMI interface, update the firmware and the BIOS, and boot into memtest so I could get that out of the way before the real fun of setting everything back up and implementing the changes and improvements could occur over this upcoming weekend.

I installed the EPYCD8-2T onto the motherboard tray and was careful to only leave in the standoffs that match with the screw holes on the mobo. I installed the CPU and the heatsink, put the mobo tray back into the case, plugged in the mobo power connectors, plugged in the case fans and front panel connections, and dragged my case back over to my desk area to get started with updating the firmware/BIOS and do a test boot.

The IPMI pulled down an address from my DHCP server just fine. I logged in and went to start the updates except I found that both the firmware and the BIOS were already the latest versions instead of the original defaults from the factory. Weird, but I assumed it wasn’t impossible for ASRock to have been still making new models later in the products life span.

I went to the KVM and pressed the power button on the case to boot it up, and nothing happened at all. Fans didn’t spin up, no other LEDS turned on, no video output, just nothing whatsoever. I figured I must have wired something up wrong so I went back and checked all the cables with the manual and everything was definitely correct. I unplugged all of the front panel connections and fans except for the heatsink fan and tried booting again from the IPMI, again nothing happened at all.

I proceeded to completely disassemble and reassemble this computer 4 times and every single time the results were exactly the same: IPMI works but the system will not power on under any circumstances. I read all of the sections of the manual pertaining to setting up the hardware, watched a few YT videos about installing EPYC processors, reseated the processor a few times, reseated the RAM and eventually dropped it down to just 1 stick, reseated all the cables, tested my power supply on my previous mobo/CPU/RAM just in case I somehow fried it during the rebuild, all to no obvious results.

At this point I feel like I’m losing my mind and the only explanation is the mobo was fried before I got it or I am overlooking something, but hours of going over everything made me feel confident I did everything right. I started digging around in the IPMI and under the system information, it shows the current processor and memory as completely different hardware than what I have. AFAIK that information can only be determined by the IPMI after it has been booted at least once, and I can’t get it to boot at all. Between that and the already up to date BIOS/firmware, this makes me think this hardware was definitely used before me. The mobo’s box was a bit frayed at the edges, but it did contain all the original accessories.

The only flaw in my logic that I can think of is that I did not have a torque wrench that goes down to the 1.6 nm or 14 in/lbs that is specified. Hilariously enough I do own a torque wrench that only goes down to 2nm. I am fairly good at eyeballing these things though, and I was very careful not to wrench down too hard or leave the contacts too loose. In the multiple times I reseated the processor, I tried varying pressures but every time it wouldn’t power up at all. I also figured that if I had either an improperly seated processor or RAM that the system would still power up but just fail to post and just get stuck showing an error code on the mobo.

Also when I bought the processor, it came from ebay and the original owner didn’t perfectly clean all the thermal paste off. During shipping a tiny little bit of thermal paste got onto the contact pads on the bottom of the processor. I cleaned it up very gently with a little bit of isopropyl alcohol on a piece coffee filter paper to prevent lint. Again, I think if this were an issue the computer would have powered on but failed to boot.

Here’s another juicy detail, so I bought the mobo on Amazon through some reseller and it was labelled as new. When the package was delivered, it came in a Newegg box. I figured that they just reused a box that they had on hand. But after I started a return for the mobo through Amazon and it gave me a return address to send it to, the return address is none other than Newegg’s primary warehouse. Newegg is pretty notorious for reselling open box items as new, so either this Amazon seller is Newegg in disguise or they are some kind of elaborate dropshipper.

Does my suspicion that the board was already fried seem as reasonable to you as well? Am I being an idiot and overlooking something? Did I ruin the hardware somehow like an even bigger idiot? I haven’t sent out the mobo return yet, but it feels like there’s nothing left to test. Thanks for reading all this if you made it this far, happy to take all the help that I can get.

All correct, it’s been used before. Just this alone would be enough for me to return something sold as new.

For startup power, the main ATX connector, 8-pin ATX12V, and 4-pin ATX12V all need to be connected. The 6-pin PCI “GFX _12V1” connector near the expansion slots does not. And it’s pretty obvious the main ATX is connected just fine in your case, since IPMI runs from the 5VSB rail.

It sounds like you’ve been pretty careful and thorough, and you’re right that most common faults beyond this will at least visibly do something, even it’s just to blink for a split second before the PSU kills all power.

I’d have no problem going with the board return in this scenario.

1 Like

Thanks for confirming my suspicions, I really appreciate it! I had a strong feeling that my inclination was correct, but I wasn’t sure if EPYC hardware just works differently in a way I was unaware of. I have had great experiences with ASRock Rack gear in the past so this was unexpected, but what can ya do.

The return is in the mail and so is the replacement. Between this and the faulty RAM I had to return for my concurrent vitualization server rebuilds, I now have almost the same amount of money I originally intended to spend tied up in returns. :neutral_face:

1 Like

So some really bad news. The replacement motherboard just came in and I am having the exact same problem. Neither the power button nor the power cycle feature in the IPMI will turn the system on.

If the RAM or CPU were somehow faulty or not seated properly, I would assume that the system would power on but fail to post.

If there was something wrong with the power supply, I would assume that the BMC would not work either. This power supply is fairly new and works just fine with the old motherboard/cpu/ram that was in this case.

Anyone have any ideas? I haven’t been this stumped by a computer in many years, totally lost right now.

If you do manage to get power: when I upgraded from my old 7742 to a 7713P and switched RAM at the same time, the first power on of my ROMED6U-2L2T took at least ten minutes. Re-populating the IPMI system inventory also took this long. Even subsequent (no changes) power-ons take around a minute. Not to boot, just to post. These are not desktop or workstation boards. So don’t jump the gun and pull power if it looks like it’s not working.

For the screws around the CPU mount, I have always just tightened them “all the way” on socket SP3. Remember, you’re looking for enough force to mash the processor down onto the pins, not just enough to contact the pins :laughing:.

Edit: edits after reading the entire OP.

1 Like

Edit: edits after reading the entire OP.

Sorry for the excessively verbose wall of text haha, I just wanted to be clear about what I have already tried and investigated. I appreciate all the help I can get, this is an utterly puzzling situation.

When I upgraded from my old 7742 to a 7713P and switched RAM at the same time, the first power on of my ROMED6U-2L2T took at least ten minutes. Re-populating the IPMI system inventory also took this long.
Even subsequent (no changes) power-ons take around a minute. Not to boot, just to post. These are not desktop or workstation boards. Is it possible you just haven’t waited long enough?

It’s not even getting as far as a power cycle beginning, nothing happens at all when I attempt to turn it on. When I press the power button on the case that is wired up to the front panel headers, nothing happens. No fan spin up, no LEDs turn on, just nothing whatsoever.

After trying the power button, I shut off the power on the PSU and disconnected everything except the 3 required power connectors. No fans, no front panel connectors, no storage devices, I even removed the IO shield for good measure. I plugged in the PSU and the ethernet cable for the IPMI, turned on the PSU, and once the IPMI was accessible I tried to turn it on from there. This is what happens:

For the screws around the CPU mount, I have always just tightened them “all the way” on socket SP3. Remember, you’re looking for enough force to mash the processor down onto the pins, not just enough to contact the pins :laughing:.

After the absolute nightmare scenario that was last week (and am now reliving again), I actually ponied up and went and bought a torque wrench from Harbor Freight that goes down to 14 inch-pounds and was going to return it after I am done with it. I normally am not one to abuse return policies but I made an exception for this scenario since I am dealing with high end hardware and don’t want to “screw” this up. This time I am actually confident that it is screwed down correctly.

The part that really throws me for a loop with all of this is that if there was a legit issue with the processor or the memory, an experience I have had before, I would assume that the system would still power up but fail to boot successfully. I am not seeing that at all though.

1 Like

Looking closer at the manual, your board’s “Dr. Debug” :rofl: should show you even pre-initialization failures.

Is this panel (bottom edge, center) showing anything? Several STH forum posts recommend re-flashing the BMC firmware (which you can do - from the BMC!).

You’ve already abused one on the motherboard! What’s one more!

1 Like

Is this panel (bottom edge, center) showing anything?

Oh how I wish it was. Nothing displays at all, it never even turns on. I used to have that debug screen feature on a DFI LANParty motherboard what feels like a lifetime ago and missed having that for troubleshooting issues. Now I have a motherboard that has one again, and I am having problems before the damn thing even gets power. Figures.

You’ve already abused one on the motherboard! What’s one more!

Don’t remind me! :cry: The first one was definitely used before me though and sold as new, so I felt justified in my returning it. My situation is not unprecedented though, there is a newegg review from a “Willie V” that describes literally the exact same situation as me.

Several STH forum posts recommend re-flashing the BMC firmware (which you can do - from the BMC!).

It felt pointless to do this since the version of the BMC firmware that came from the factor is the latest version, but I am attempting that now.

If you’re getting IPMI power (5v) but no other power (12v), you will want to try a different power supply.

“Willie” also wonders aloud in his review: “Is QA bad enough to have gotten two bad boards in a row from different vendors? At my wits end at this point.” My answer to that is — no. And in your case the board’s IPMI inventory came pre-populated with someone else’s CPU model, so you know someone else had it working!

1 Like

No dice on the firmware flashing.

If you’re getting IPMI power (5v) but no other power (12v), you will want to try a different power supply.

Yeah it seems like the power supply is practically the only thing left. Very odd since it was working 100% fine 4 hours ago with the old hardware before I started this endeavor again, and it turns on just fine with the PSU tester when it is unplugged from this motherboard, but I need to rule it out. I am going to open up my primary desktop and try the PSU from there.

Any idea how to tell if the CPU had previously been locked to a vendor?
or would it not even run ipmi?

might make the cpu vendor show different?

1 Like

I completely disassembled my primary desktop computer and put in the EPYCD8-2T using the PSU from my primary desktop and got the exact same results. It will not power on whatsoever. And I know that there is nothing wrong with my case and PSU from my primary desktop either because I am now typing this message on my rebuilt primary desktop.

“Willy” also wonders aloud in his review: “Is QA bad enough to have gotten two bad boards in a row from different vendors? At my wits end at this point.” My answer to that is — no.

Believe me when I say that I agree with you completely, the chances of that happening are essentially impossible, but as I am currently living through the exact same nightmare it REALLY has me wondering.

Any idea how to tell if the CPU had previously been locked to a vendor?
or would it not even run ipmi?
might make the cpu vendor show different?

This is exactly where I am stuck. At this point I have effectively ruled out issues with literally every component except the CPU and RAM. The problem is that I have neither an extra EPYC processors/DDR4 RDIMMs that I can test with, nor a known working EPYC system with which I can test the processor/RAM I currently have. Also the return window on the processor is rapidly closing, so I need to figure this out ASAP.

Can anyone correlate my findings with theirs? Is it even possible that there could be something so wrong with the CPU/RAM in an EPYC system that it completely refuses to power on? The motherboard’s debug code screen has error codes that correspond with CPU/RAM issues according to the manual, so I would assume that if there was a problem with the CPU/RAM (no matter whether it’s fried, vendor-locked, improperly seated, whatever) that the system should at the very least power on and report an error code. The idea that a faulty processor or RAM could entirely prevent the system from powering on is mind boggling to me, how could the motherboard even know that something is wrong with them if it never turns on?

I am so desperate that I am about to attempt powering on the system with no RAM or CPU installed at all to see if it will at least show an error code then. If not, I don’t know how my situation could be anything but a 2nd faulty motherboard in a row.

1 Like

It should power up without RAM, just stall on a DrDebug code.

The CPU vendor lock is cryptographic, so it has to power up and communicate to get that far. Reports are that it will stall on a particular set of Dr Debug codes, and possibly display a message on the screen (depending on the BIOS).

There are configuration pads on the CPU that identify socket variation (e.g. EPYC or Threadripper), and the informal description I’ve seen is that the “wrong” CPU should not be powered up, but I have no idea what that looks like in practice. I’ve also never tried an SP3 board without a CPU present at all. So no idea what “typical” behavior to expect here. I don’t see any harm in trying though.

Not powering up (despite working IPMI) is a failure mode I’ve seen reported with the EPYCD8 specifically in the past, but two in a row does seem unlikely.

1 Like

This is my understanding of things as well. I would think that if the motherboard is actually functioning correctly, the system should power up but immediately fail to post and probably hang on the first error code because the first test would probably be “is there a CPU present?”. If that happens, then it will be further than I have gotten with both motherboards so far.

I have scoured the internet trying to look for someone talking about firing up an EPYC system without a processor installed but came up empty handed. I am prepared to be the internet’s guinea pig and do it for science. Besides what’s the worst that could happen, I wind up with a non-functional computer? I have one of those right now lol.

Not powering up (despite working IPMI) is a failure mode I’ve seen reported with the EPYCD8 specifically in the past, but two in a row does seem unlikely.

NEVER underestimate my bad luck.

Alright here goes nothing, for science!

And of course, I had the exact same result. Powering on from the IPMI results in a “Performing power action failed” error and the physical power button does nothing. I have whittled this down to literally just a motherboard plugged into a known working PSU and that doesn’t even work, so I guess I can safely assume I got another dud. Two DOA motherboards in a row, I believe this is a new personal record.

I guess I am rolling the dice on a Supermicro board next because lord knows I refuse to be a sucker a 3rd time. The part that really sucks is that by the time I get it, I will have almost certainly missed the return window on the processor.

sigh

Exciting stuff, I’ve gone through my ROMED6U-2L2T and one for a friend’s build with absolutely zero issues. So AsRock Rack doesn’t have bad QC with every board. Maybe the reject parts off those went onto the ROMED8 :face_with_open_eyes_and_hand_over_mouth:. Or you said this was ordered through some strange Amazon back channel?

1 Like

I also just had 2 great experiences with ASRock Rack gear, I just rebuilt my primary and secondary virtualization servers with 2 of the x470D4U motherboards and they are running perfectly after some initial RAM issues which were unrelated to the motherboard. The EPYCD8-2T was supposed to make for a trifecta, but alas here I am getting drunk late at night after the replacement was just as much of a dud as the first one was.

The first one was purchased via Amazon, but the Amazon reseller was either Newegg in disguise or a dropshipper that buys stuff through Newegg because it came in a Newegg box and the return address was Newegg’s main warehouse. This one I have right now was an eBay purchase, but purchased from Newegg’s eBay store lol. Everything sucks in this situation so I don’t even care enough to point a finger at anyone in particular.

Note this is about the first generation EPYCD8, not the second generation ROMEDn line. Both have generally been good for people from what I’ve seen anecdotally, but of course that doesn’t say much for the problem here.

It’s unfortunate that the CPU isn’t known-good here, but I don’t have enough direct experience to be able to definitively point to one of the two in this situation. The troubleshooting so far is certainly well reasoned.

It’s unfortunate that the CPU isn’t known-good here, but I don’t have enough direct experience to be able to definitively point to one of the two in this situation.

This part is driving me nuts. I am assuming that if a CPU is defective somehow, the motherboard should still power up but immediately fail to post because it can’t locate a working CPU and either display an error code or a beep code. But that is only an assumption. I think I am going to make a new thread to ask the community this specific question just in case someone may have more insight.

1 Like

Also big thanks to @Quension, @rrubberr, and @Trooper_ish, I have felt like I am losing my mind dealing with this mess and y’all helped immensely.

1 Like