ASRock EPYCD8-2T not powering on

So some really bad news. The replacement motherboard just came in and I am having the exact same problem. Neither the power button nor the power cycle feature in the IPMI will turn the system on.

If the RAM or CPU were somehow faulty or not seated properly, I would assume that the system would power on but fail to post.

If there was something wrong with the power supply, I would assume that the BMC would not work either. This power supply is fairly new and works just fine with the old motherboard/cpu/ram that was in this case.

Anyone have any ideas? I haven’t been this stumped by a computer in many years, totally lost right now.

If you do manage to get power: when I upgraded from my old 7742 to a 7713P and switched RAM at the same time, the first power on of my ROMED6U-2L2T took at least ten minutes. Re-populating the IPMI system inventory also took this long. Even subsequent (no changes) power-ons take around a minute. Not to boot, just to post. These are not desktop or workstation boards. So don’t jump the gun and pull power if it looks like it’s not working.

For the screws around the CPU mount, I have always just tightened them “all the way” on socket SP3. Remember, you’re looking for enough force to mash the processor down onto the pins, not just enough to contact the pins :laughing:.

Edit: edits after reading the entire OP.

1 Like

Edit: edits after reading the entire OP.

Sorry for the excessively verbose wall of text haha, I just wanted to be clear about what I have already tried and investigated. I appreciate all the help I can get, this is an utterly puzzling situation.

When I upgraded from my old 7742 to a 7713P and switched RAM at the same time, the first power on of my ROMED6U-2L2T took at least ten minutes. Re-populating the IPMI system inventory also took this long.
Even subsequent (no changes) power-ons take around a minute. Not to boot, just to post. These are not desktop or workstation boards. Is it possible you just haven’t waited long enough?

It’s not even getting as far as a power cycle beginning, nothing happens at all when I attempt to turn it on. When I press the power button on the case that is wired up to the front panel headers, nothing happens. No fan spin up, no LEDs turn on, just nothing whatsoever.

After trying the power button, I shut off the power on the PSU and disconnected everything except the 3 required power connectors. No fans, no front panel connectors, no storage devices, I even removed the IO shield for good measure. I plugged in the PSU and the ethernet cable for the IPMI, turned on the PSU, and once the IPMI was accessible I tried to turn it on from there. This is what happens:

For the screws around the CPU mount, I have always just tightened them “all the way” on socket SP3. Remember, you’re looking for enough force to mash the processor down onto the pins, not just enough to contact the pins :laughing:.

After the absolute nightmare scenario that was last week (and am now reliving again), I actually ponied up and went and bought a torque wrench from Harbor Freight that goes down to 14 inch-pounds and was going to return it after I am done with it. I normally am not one to abuse return policies but I made an exception for this scenario since I am dealing with high end hardware and don’t want to “screw” this up. This time I am actually confident that it is screwed down correctly.

The part that really throws me for a loop with all of this is that if there was a legit issue with the processor or the memory, an experience I have had before, I would assume that the system would still power up but fail to boot successfully. I am not seeing that at all though.

1 Like

Looking closer at the manual, your board’s “Dr. Debug” should show you even pre-initialization failures.

Is this panel (bottom edge, center) showing anything? Several STH forum posts recommend re-flashing the BMC firmware (which you can do - from the BMC!).

You’ve already abused one on the motherboard! What’s one more!

1 Like

Is this panel (bottom edge, center) showing anything?

Oh how I wish it was. Nothing displays at all, it never even turns on. I used to have that debug screen feature on a DFI LANParty motherboard what feels like a lifetime ago and missed having that for troubleshooting issues. Now I have a motherboard that has one again, and I am having problems before the damn thing even gets power. Figures.

You’ve already abused one on the motherboard! What’s one more!

Don’t remind me! :cry: The first one was definitely used before me though and sold as new, so I felt justified in my returning it. My situation is not unprecedented though, there is a newegg review from a “Willie V” that describes literally the exact same situation as me.

Several STH forum posts recommend re-flashing the BMC firmware (which you can do - from the BMC!).

It felt pointless to do this since the version of the BMC firmware that came from the factor is the latest version, but I am attempting that now.

If you’re getting IPMI power (5v) but no other power (12v), you will want to try a different power supply.

“Willie” also wonders aloud in his review: “Is QA bad enough to have gotten two bad boards in a row from different vendors? At my wits end at this point.” My answer to that is — no. And in your case the board’s IPMI inventory came pre-populated with someone else’s CPU model, so you know someone else had it working!

1 Like

No dice on the firmware flashing.

If you’re getting IPMI power (5v) but no other power (12v), you will want to try a different power supply.

Yeah it seems like the power supply is practically the only thing left. Very odd since it was working 100% fine 4 hours ago with the old hardware before I started this endeavor again, and it turns on just fine with the PSU tester when it is unplugged from this motherboard, but I need to rule it out. I am going to open up my primary desktop and try the PSU from there.

Any idea how to tell if the CPU had previously been locked to a vendor?
or would it not even run ipmi?

might make the cpu vendor show different?

1 Like

I completely disassembled my primary desktop computer and put in the EPYCD8-2T using the PSU from my primary desktop and got the exact same results. It will not power on whatsoever. And I know that there is nothing wrong with my case and PSU from my primary desktop either because I am now typing this message on my rebuilt primary desktop.

“Willy” also wonders aloud in his review: “Is QA bad enough to have gotten two bad boards in a row from different vendors? At my wits end at this point.” My answer to that is — no.

Believe me when I say that I agree with you completely, the chances of that happening are essentially impossible, but as I am currently living through the exact same nightmare it REALLY has me wondering.

Any idea how to tell if the CPU had previously been locked to a vendor?
or would it not even run ipmi?
might make the cpu vendor show different?

This is exactly where I am stuck. At this point I have effectively ruled out issues with literally every component except the CPU and RAM. The problem is that I have neither an extra EPYC processors/DDR4 RDIMMs that I can test with, nor a known working EPYC system with which I can test the processor/RAM I currently have. Also the return window on the processor is rapidly closing, so I need to figure this out ASAP.

Can anyone correlate my findings with theirs? Is it even possible that there could be something so wrong with the CPU/RAM in an EPYC system that it completely refuses to power on? The motherboard’s debug code screen has error codes that correspond with CPU/RAM issues according to the manual, so I would assume that if there was a problem with the CPU/RAM (no matter whether it’s fried, vendor-locked, improperly seated, whatever) that the system should at the very least power on and report an error code. The idea that a faulty processor or RAM could entirely prevent the system from powering on is mind boggling to me, how could the motherboard even know that something is wrong with them if it never turns on?

I am so desperate that I am about to attempt powering on the system with no RAM or CPU installed at all to see if it will at least show an error code then. If not, I don’t know how my situation could be anything but a 2nd faulty motherboard in a row.

1 Like

It should power up without RAM, just stall on a DrDebug code.

The CPU vendor lock is cryptographic, so it has to power up and communicate to get that far. Reports are that it will stall on a particular set of Dr Debug codes, and possibly display a message on the screen (depending on the BIOS).

There are configuration pads on the CPU that identify socket variation (e.g. EPYC or Threadripper), and the informal description I’ve seen is that the “wrong” CPU should not be powered up, but I have no idea what that looks like in practice. I’ve also never tried an SP3 board without a CPU present at all. So no idea what “typical” behavior to expect here. I don’t see any harm in trying though.

Not powering up (despite working IPMI) is a failure mode I’ve seen reported with the EPYCD8 specifically in the past, but two in a row does seem unlikely.

1 Like

This is my understanding of things as well. I would think that if the motherboard is actually functioning correctly, the system should power up but immediately fail to post and probably hang on the first error code because the first test would probably be “is there a CPU present?”. If that happens, then it will be further than I have gotten with both motherboards so far.

I have scoured the internet trying to look for someone talking about firing up an EPYC system without a processor installed but came up empty handed. I am prepared to be the internet’s guinea pig and do it for science. Besides what’s the worst that could happen, I wind up with a non-functional computer? I have one of those right now lol.

Not powering up (despite working IPMI) is a failure mode I’ve seen reported with the EPYCD8 specifically in the past, but two in a row does seem unlikely.

NEVER underestimate my bad luck.

Alright here goes nothing, for science!

And of course, I had the exact same result. Powering on from the IPMI results in a “Performing power action failed” error and the physical power button does nothing. I have whittled this down to literally just a motherboard plugged into a known working PSU and that doesn’t even work, so I guess I can safely assume I got another dud. Two DOA motherboards in a row, I believe this is a new personal record.

I guess I am rolling the dice on a Supermicro board next because lord knows I refuse to be a sucker a 3rd time. The part that really sucks is that by the time I get it, I will have almost certainly missed the return window on the processor.

sigh

I also just had 2 great experiences with ASRock Rack gear, I just rebuilt my primary and secondary virtualization servers with 2 of the x470D4U motherboards and they are running perfectly after some initial RAM issues which were unrelated to the motherboard. The EPYCD8-2T was supposed to make for a trifecta, but alas here I am getting drunk late at night after the replacement was just as much of a dud as the first one was.

The first one was purchased via Amazon, but the Amazon reseller was either Newegg in disguise or a dropshipper that buys stuff through Newegg because it came in a Newegg box and the return address was Newegg’s main warehouse. This one I have right now was an eBay purchase, but purchased from Newegg’s eBay store lol. Everything sucks in this situation so I don’t even care enough to point a finger at anyone in particular.

Note this is about the first generation EPYCD8, not the second generation ROMEDn line. Both have generally been good for people from what I’ve seen anecdotally, but of course that doesn’t say much for the problem here.

It’s unfortunate that the CPU isn’t known-good here, but I don’t have enough direct experience to be able to definitively point to one of the two in this situation. The troubleshooting so far is certainly well reasoned.

It’s unfortunate that the CPU isn’t known-good here, but I don’t have enough direct experience to be able to definitively point to one of the two in this situation.

This part is driving me nuts. I am assuming that if a CPU is defective somehow, the motherboard should still power up but immediately fail to post because it can’t locate a working CPU and either display an error code or a beep code. But that is only an assumption. I think I am going to make a new thread to ask the community this specific question just in case someone may have more insight.

1 Like

Also big thanks to @Quension, @rrubberr, and @Trooper_ish, I have felt like I am losing my mind dealing with this mess and y’all helped immensely.

1 Like

An update to this absolute mess: the processor has been faulty the whole time.

I received the replacement motherboard, a Supermicro H12SSL-i, and put the system together with all the parts I have. After turning on the power supply, none of the LEDs came on including the power OK or the BMC heartbeat LED. After running through all the troubleshooting steps I could think of, I pulled out the processor completely and tested again with no processor in the socket. This caused the power LEDs to come on and the BMC heartbeat LED to start.

As far as I can tell after spending weeks researching this, I believe that this is the first documented case on the internet of an EPYC processor being so spectacularly faulty that the motherboard has power issues when it is installed. If this issue is documented anywhere else, then my google-fu failed me and not for lack of trying. Sucks I had to be the one to spend going on 3 weeks figuring this out, but I hope that this is helpful to some other poor fool like myself frantically googling their issue.

1 Like

Thanks for the followup, it’s helpful to know this behavior exists.

Was this discovery in time to make a claim on the defective processor?

1 Like

Of course, it is the duty of every responsible nerd to make sure they are not the next DenverCoder9.

wisdom_of_the_ancients

Short version, yes.

Long version, when the seller created the eBay listing they specified a 30 day return window. But in the description they wrote that they only accept returns within 14 days. I learned it was definitely faulty on day 19 after it was delivered. Because the listing had 30 days specified in eBay’s system, I was able to get an automated return and it is already in the mail. The seller could refuse to accept the return, but between their error when creating the listing, the fact that they ignored my attempt to contact them when I was still within the 14 day period letting them know that it hasn’t been working and that I am waiting on a 3rd mobo to test it with, a mountain of evidence I compiled including pictures, and that I bought it with a credit card, they are NOT getting away with screwing me out of my money.

The real pain in the ass is that I am now on going onto week 6 of a 3 week long project and my NAS still isn’t upgraded, but what can ya do. Once I finally have a working system, I am planning on redoing my NAS on vanilla FreeBSD and will be writing a little how-to guide that I will post here for fellow homelabbers who are interested in full ZFS support without corporate strings attached like I have been for a few years now.

3 Likes

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.