Goodbye (and Hello) threadripper

Whats really wild is similar issues from a wrx80 system too with little to no component overlap. I would love to trythat machine at someone elses house. Just cuz

Ops cia lab 2 stories under ground in that neighborhood

6 Likes

I am still running many test and just generally organzing my system / data preparing it for work. …interesting things afoot. Will update when i have more information.

1 Like

Just wondering, have you ever tried hooking that system to a good UPS instead of the mains directly?

It’s in a ups now. APC BackUps 1500 …might be a bit weak

i feel for you Kitty, i can’t begin to imagine just how disappointed and miserable you must be feeling… there is nothing of value i could add to the comments here, not just because i have never been this totally unlucky with anything system related but also because even the smart peeps in this thread are astounded too…

only things that come to mind is (you probably have already, but if not) like infinitevalence suggested about setting up at a friends house or another place at least 20 miles away or something because power lines, cell towers, lordy knows what else…

one other thing which might be completely silly but, since you’ve already troubleshooted to hell and back… where you live now, have you ever heard the hum?” for some places and people who live near towers, gas lines and pipelines and stuff, the vibration can be pretty loud and stuff, like recordable, and vibration bad for things?

:crossed_fingers: your next system posts perfectly on second boot… don’t let this awful experience bring you down, you’ve done everything humanely possible :cookie: :cookie: :cookie:

2 Likes

Just in case you hadn’t seen elsewhere Wendell posted a case vid today but included a mini rant about the ASUS WRX90E Sage motherboard here He pins his issue on dimm slot E even though the memory was on the QVL and exchanged his board. He said the replacement works, so maybe there’s a hardware revision change?

I’m no expert but frankly it sounds like a board design or UEFI problem to me, vendors still are having issues with Samsung & Micron DDR5 on plain old Ryzen even despite modern UEFI releases. If they can’t even get plain old dual-channel to work with Micron chips then I wouldn’t be surprised if a whopping 8-channel setup has some issues, QVL or not.

Shot in the dark, did you try without the UPS in the loop? The APC model I googled is only rated for a maximum output wattage of only 865w… so if you exceed that you’re overrunning the unit and potentially overdrawing the batteries depending on its topology. I have a CP1500PFCLCD and even that is only rated for 1,000w total output off the UPS side.

Trying it at another building is always worth a shot, but I see ASRock also makes a WRX90 board and maybe that’s worth trying if it meets your needs. There’s just been an unending stream of issues with ASUS in the last three years that I’d never touch them again, I’d suggest ASRock at this point (which is whom I chose for my B650E build and did not regret it).

Wish I could be more help, build issues are frustrating and bad enough with a regular system, but when spending several grand on a premium Threadripper system it’s all the more frustrating (and inexcusable) for it to just not work.

3 Likes

No. Same video I also said that the replacement and the og were likely manufactured hours or minutes apart by the box stickers. its just variability.

What I dont get is we dont see the same level of issues with server motherboards like Supermicro. How is it that you can buy a dozen SM boards and have no issues, but we see here nearly a dozen boards that have had all sorts of issues.

What are the workstation boards doing differently?

it’s been a bit frustrating

ill say lmao

surprised you haven’t punched it through the floor yet

2 Likes

Check out my previous posts – one of them here:

The Achillies Heel of the Threadripper is their memory controller. With high I/O loads, about half of the overhead needed to serialize/deserialize all of the data to move it across their Infinity Fabric is concentrated in the central I/O die… This does not have the same thermal protections that the chiplets do. I’ve fried a number of expensive 3990x CPUs – high I/O Postgres processes running full tilt for hours at a time. In all of them, the memory controller died.

Consumer boards will run fast RAM speeds – but silicon just doesn’t like running over 3 GHz. With server gear, it’s almost always limited to 3 GHz or under. If you run Threadripper RAM at this or JEDEC speeds – no problems. If you run it at DOCP or EXPO speeds – this also ramps the speed/voltages up on the motherboards SOC and the CPU’s central I/O die (the memory controller). This will run for short benchmarks – but put server loads on it and BZZZZT…

Just my 2 cents… Server gear is like a pickup truck – built for carrying loads and reliability. Consumer gear is like a Lamborgini – great benchmark numbers. But, don’t try and haul your boat with it…

5 Likes

@Kougar @wendell

I just saw the fractal video it was great! Loved the ending :rofl:

Happy to see all of the comments offering suggestions and help it means a lot.

I am continuing to work on the build and will give updates when i accomplish more. Lots of testing and setup underway.

4 Likes

Trust me i have wanted to. Got close a few times.

1 Like

My apologies, I should’ve realized that’s what you meant when you were saying the serials were so close.

But I had wanted to ask you… if this comes down to variability then wouldn’t that imply there’s an underlying design issue in the ASUS Sage motherboard or UEFI itself? Granted I’ve not seen anything about the ASRock WRX90 board’s reliability yet, so theoretically it could be a platform level issue I suppose.

I still have an older HUB podcast episode from February ringing in my ears about how Steve had no end of a myriad of issues with a last-gen Threadripper build and eventually just replaced the system outright with a regular Ryzen build after less than a year… but Tim hadn’t experienced those issues himself with his Threadripper platform. Such variability is not something that inspires trust in the platform given it wasn’t the first time I’ve heard similar reports.

1 Like

You could look at a Lenovo P8. The P8 is 79xx ThreadRipper Pro, and comes with warranty. DIY on ThreadRipper Pro seems sketchy. I have multiple Lenovo P620s with 59xx and they are rock solid. Dell makes one too if I recall.

When I bought.my recent Intel SPR workstation, the price difference between the Lenovo and DIY was around 500$. My time is worth more than that.

You need to read the power and storage guides to make sure you order the right machine to meet your needs.

The Lenovo will not be as fast as DIY since they tune the BIOS to be conservative, but it will just work, and you can add your own RAM that’s not in QVL, and they won’t refuse to give you service. But you will get CPU lock, aka if you put in a faster CPU, it will only work in Lenovo. Makes Lenovo CPUs on eBay super cheap.

There’s discount codes available to reduce the price 40-50% off list when ordering direct from Lenovo. I always order direct with discount codes, better than resellers. Resellers will rip you off… here’s looking at you CDW and Insight…

This might actually be the problem. most 1500VA UPS can not pull 1500 Watts, but much less. I would step up to a 2200VA UPS such as this one that can provide up to 1800 watts.

Another reason I picked this one is the Dual On Line Conversion UPS puts out a pure sine wave and isolates your PC from your electrical system by powering your system from the battery 100% of the time versus cheaper ones that will switch to battery after sensing a power outage. The switching time can also cause problems, it may not be a perfect sine wave, etc…

5 Likes

I will look into this.

This is because the release of the WRX90 platform was rushed, the production volume is very low, ASUS will not profit from it, so they don’t care. I think there were only about ten thousand of these boards made because the DIY 10+ grand workstation market is tiny. Still no bios update…

1 Like

Right, VA != W, it’s VA * PFC = W, where PFC can vary between 0.5(!) to ~0.9.

The output of a UPS is often far worse and causes problems than just directly using the wall socket, unless you go for a very expensive dual conversion - and even then I’d go for a Vertiv or Eaton if you want a sine without steps. Remember to actually test the output on a scope!

After skimming this thread was there also lots of issues with der threadrippers?

I have my 5975WX TR PRO running on a 1500 APC Back-UPS PRO. That unit also runs a pair of Predator monitors plus all my peripherals including VR and there is zero problem. I don’t OC or run anything particularly funky beyond AI models, but my system is perfectly steady. I have 13 minutes of runtime on battery according to the screen and that’s with a battery in dire need of replacement.