THREADRIPPER 7000 wrx90 - TRX50 SYSTEM BUILD AND STABILITY ISSUES "USER INPUT SURVEY"

EDIT, just wanted to add a preface of sorts, I respect that for everyone building/using these systems time is money and I would think most people with these systems are professionals/have projects they are working on with these systems, therefore don’t have a lot of time to sit and write up posts here that they already have other threads on, so this is can be seen as more of a long term post for if/when anyone has time to post there problem/solutions here.

thank you everyone for your time reading all this.

there are 2 general intentions behind this post,

  1. is to gather issues people have had with their systems and possibly what solved those issues, in a more approachable thread for people to search all in one place for resolutions to issues they are having.
  2. as threadripper systems are a niche product with their own niche problems, for someone with little to no experience with above consumer/gaming-oriented systems may be quite intimidated if they run into 50+ response threads of people with much more advanced technical experience/professional educations with these systems trying to solve problems they are having, and while that isn’t directly threadripper owners’ problem to solve for them amd has said in interviews the support for these systems depends on consumer adoption of the platform whether that specifically only applies to trx50 and not wrx90 I’m not certain, so the more people who can easily find solutions to issues, the more people get brought onto these systems the better.

things such as cooler mounting pressure being important is something that is not as much a concern on consumer grade hardware where there is a ton of easily searchable coolers made specifically for that hardware, while amd does list supported coolers on their website, there is still very few coolers available that were designed for threadripper 7000 specifically, as well as reports that even those coolers aren’t performing very well for some.

the consumer adoption = platform support was said in one of these interviews with David McAfee as far as I remember.

A deep dive into AMD Threadripper 7000X - Interview with David McAfee (youtube.com)

Ryzen Threadripper 7000 Series Q&A With AMD’s David McAfee (youtube.com)

I wanted to ask everyone to post some system details so we can have one thread with everyone’s system specs and stability issues in one place in one post per person for ease of comparison from system to system.

please note I have also cross posted this on the threadripper reddit if you are a user on that forum and responded to the post there, please post link to your post there as to avoid double posts, thank you

link to post on reddit,

THREADRIPPER 7000 pro wrx90 SYSTEM BUILD AND STABILITY ISSUES “SURVEY” : threadripper (reddit.com)

If your system has been stable please post as well with note that your system is stable.

info I can think of that may be helpful.

  1. motherboard

  2. processor

  3. ram used, was ram on QVL at time of first use?

  4. cooler

  5. psu

  6. custom psu cables?

  7. gpu

  8. add in cards

  9. motherboard Q codes/leds

  10. any system logs that you may have of these issues

  11. list of issues you have experienced, with personal explanation

  12. photos of build

  13. anything not listed that may be helpful.

2 Likes

build not finished but my system specs will be,

  1. ASUS wrx90
  2. threadripper pro 7975wx
  3. ram I expect to buy 256gb to 512gb vcolor threadripper optimized ram when released
  4. cooler, custom hard line water cooling with WATERCOOL HEATKILLER IV PRO nickel for Threadripper cpu block
  5. xpg fusion 1600w titanium, will likely be adding a second psu as gpu’s get added later
  6. no custom cables
  7. RTX 4090 with watercool heatkiller water block, adding later, RTX 6000 ada later and RTX 5090 when released
  8. Apex Storage X16 nvme card
  9. other system specs, lian li v3000 plus, 3 hardware labs gtx 480’s, dual watercool heatkiller 200 tube reservoir, dual watercool heatkiller d5 pumps, 8 noctua nf-a12x24 and 8 phanteks d30’s.
  10. will add further info when system is running, can’t give specific eta yet.
  1. ASUS WRX90 Sage
  2. AMD Threadripper Pro 7975WX
  3. 128 GB (4x32) G.Skill Zeta R5 (Running EXPO I)
  4. Noctua NH-U14S TR5-SP6 Update
  5. ASUS ROG 1600W Titanium PSU
  6. No custom cables
  7. ASUS ROG Strix 4090 OC & Intel Arc A770 (this was the cause of my biggest stability issue and has since been disconnected)
  8. Creative Soundblaster AE-7
  9. N/A
  10. N/A
  11. N/A
  12. (Will add some later)

Since removing the Intel Arc A770 which I meant to use as a side AV1 encoder, my system stability issues have disappeared so far. I’ve been up and running leaving the machine on for the last few days doing work with no crashes or performance hiccups when I sit back down and start working again.

1 Like

My build has been up and running for about a week.

  1. Motherboard: ASUS WRX90e-SAGE SE
  2. Processor: 7985WX
  3. Memory: Kingston KF560R32RBK8-256 (chosen before it was added to the QVL)
  4. Cooler: Arctic Freezer 4U-M
  5. PSU: Corsair AX-1600i
  6. Custom cables: No
  7. GPU: MSI GeForce GTX 1080
  8. Add-in cards: Fenvi AX210 M.2 wireless adapter (for the bluetooth)
  9. Q-codes: None
  10. Issues/logs: See below…
  11. Anything else: No overclocking, no XMP profiles. I’m not touching any of those settings until I’m 1000% certain I won’t need an RMA.

Issue #1 - Memory temperature (resolved)

One of my more memory-intensive workloads was causing DIMM temperatures (especially slots B+C and F+G) to rapidly approach 85C+. I added two 40x10mm fans to each bank of DIMMs to get some airflow between the modules. Memory temperatures are now stable at about 70C under prolonged load.

Issue #2 - DisplayPort output doesn’t always work (intermittent)

I initially thought my build was DOA, because I had no signal on my display. Absent any integrated graphics, the remote KVM in the IPMI was a lifesaver. After some troubleshooting, I found that as long as I booted using HDMI, the display would work (even if I switched to the DisplayPort cable after the OS loaded.) Applying this firmware update for my graphics card helped. DisplayPort now works on boot most of the time, but it’s still hit-or-miss.

Issue #3 - Two unexplained freezes in Linux

This didn’t happen until today. In both instances, the system was idle when it just stopped responding. The screensaver had the display in powersave mode and it didn’t respond to keypresses. The network stack was dead; no response to my pings. No Q-codes (just the usual “AA”). I could log into the IPMI web interface over the dedicated management network interface, but everything looked normal there (no errors logged, sensors were nominal.) The KVM was “No Signal”, but that’s expected behavior when the OS is displaying via the discrete graphics card.

There was nothing useful in the Linux syslog for the first occurrence, other than a block of NUL characters.

2024-02-24T10:42:49.848854-05:00 mighty /usr/libexec/gdm-x-session[5205]: (II) UnloadModule: "libinput"
2024-02-24T10:42:49.848881-05:00 mighty /usr/libexec/gdm-x-session[5205]: (II) systemd-logind: releasing fd for 13:83
2024-02-24T10:52:10.180297-05:00 mighty ubuntu-report[5069]: level=error msg="data were not delivered successfully to metrics server, retrying in 1800s"
^@^@^@^@^@^@^@^@...a bunch more NUL characters, followed by logs from after I rebooted...^@^@^@^@^@^@^@^@
2024-02-24T12:02:13.292799-05:00 mighty systemd-modules-load[2546]: Inserted module 'lp'
2024-02-24T12:02:13.292867-05:00 mighty systemd-modules-load[2546]: Inserted module 'ppdev'

The second time it happened, there were more interesting messages in the logs, but I’m not sure what it means:

2024-02-24T12:46:07.692777-05:00 mighty systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
2024-02-24T12:46:07.696316-05:00 mighty systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
2024-02-24T13:02:06.545553-05:00 mighty rtkit-daemon[4098]: The canary thread is apparently starving. Taking action.
2024-02-24T13:02:06.548921-05:00 mighty rtkit-daemon[4098]: Demoting known real-time threads.
2024-02-24T13:02:06.548958-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4101 of process 4062.
2024-02-24T13:02:06.548985-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4062 of process 4062.
2024-02-24T13:02:06.549017-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4099 of process 4066.
2024-02-24T13:02:06.549041-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4066 of process 4066.
2024-02-24T13:02:06.549065-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4102 of process 4067.
2024-02-24T13:02:06.549089-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4067 of process 4067.
2024-02-24T13:02:06.549116-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4097 of process 4063.
2024-02-24T13:02:06.549140-05:00 mighty rtkit-daemon[4098]: Demoted 7 threads.
2024-02-24T13:05:04.031212-05:00 mighty kernel: [ 2047.558967] clocksource: timekeeping watchdog on CPU104: Marking clocksource 'tsc' as unstable because the skew is too large:
2024-02-24T13:05:04.031501-05:00 mighty kernel: [ 2047.559118] clocksource:                       'hpet' wd_nsec: 0 wd_now: d32ccbb0 wd_last: 3bbc23fa mask: ffffffff
2024-02-24T13:05:04.031559-05:00 mighty kernel: [ 2047.559195] clocksource:                       'tsc' cs_nsec: 177448917847 cs_now: 63eef73df80 cs_last: 5baec91fda0 mask: ffffffffffffffff
2024-02-24T13:05:04.031584-05:00 mighty kernel: [ 2047.559275] clocksource:                       Clocksource 'tsc' skewed 177448917847 ns (177448 ms) over watchdog 'hpet' interval of 0 ns (0 ms)
2024-02-24T13:05:04.031599-05:00 mighty kernel: [ 2047.559351] clocksource:                       'tsc' is current clocksource.
2024-02-24T13:05:04.031615-05:00 mighty kernel: [ 2047.559505] tsc: Marking TSC unstable due to clocksource watchdog
2024-02-24T13:05:04.031635-05:00 mighty kernel: [ 2047.560191] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
2024-02-24T13:05:04.031646-05:00 mighty kernel: [ 2047.560246] sched_clock: Marking unstable (2047570944386, -9984456)<-(2047606829910, -46893244)
2024-02-24T13:05:04.054823-05:00 mighty kernel: [ 2047.566605] clocksource: Checking clocksource tsc synchronization from CPU 23 to CPUs 0,3,36,43-44,60,67,118.
2024-02-24T13:05:04.055208-05:00 mighty kernel: [ 2047.587010] clocksource: Switched to clocksource hpet
2024-02-24T13:05:04.073177-05:00 mighty systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
2024-02-24T13:05:04.082146-05:00 mighty rtkit-daemon[4098]: The canary thread is apparently starving. Taking action.
2024-02-24T13:05:04.083693-05:00 mighty systemd[1]: systemd-logind.service: Killing process 3657 (systemd-logind) with signal SIGABRT.
2024-02-24T13:05:04.084653-05:00 mighty rtkit-daemon[4098]: Demoting known real-time threads.
2024-02-24T13:05:04.085463-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4101 of process 4062.
2024-02-24T13:05:04.086488-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4062 of process 4062.
2024-02-24T13:05:04.087332-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4099 of process 4066. 
2024-02-24T13:05:04.088061-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4066 of process 4066.
2024-02-24T13:05:04.088843-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4102 of process 4067.
2024-02-24T13:05:04.089553-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4067 of process 4067.
2024-02-24T13:05:04.090417-05:00 mighty rtkit-daemon[4098]: Successfully demoted thread 4097 of process 4063.
2024-02-24T13:05:04.091255-05:00 mighty rtkit-daemon[4098]: Demoted 7 threads.
(No more logs until I noticed it was unresponsive and rebooted the system...)
2024-02-24T15:00:53.061061-05:00 mighty systemd-modules-load[2361]: Inserted module 'lp'

I spent some time today setting up serial console through the IPMI, so if this happens again, more troubleshooting options might be available.

4 Likes
  1. Motherboard: ASrock TRX50 WS
  2. Processor: 7960X
  3. Memory: 256GiB Hynix 4800 MT/s (Not on QVL)
  4. Cooler: Noctua U14S TR5
  5. PSU: EVGA SuperNOVA 1600
  6. No custom PSU cables
  7. GPU: Radeon Pro W5500
  8. Addin cards: U.2 breakout and an extra Intel NIC
  9. ??? Unsure
  10. Anything else: No overclocking yet, Optane!

I’ve been running this configuration for just over 2 months at this point

Issues

Early boot: The BIOS isn’t fully baked.

  1. I can’t seem to make it stay off after an external power fault (power on at power restore seems to be hard coded, despite setting “power off” in the bios).
  2. No UEFI shell built in (it does pick up shells in a UEFI partition, thankfully)
  3. Some settings aren’t cleared with a CMOS clear
  4. U.2 Optane drives in PCIe riser not recognized in UEFI/boot menu. Have to use M.2, MCIO, SlimSAS, or SATA drive for UEFI boot drive.
  5. Changing any setting will re-train the memory for ~5 minutes, even if all you did was change secure boot or enabled the serial port. I can’t find a way to disable this. Thankfully normal reboots don’t do this.
  6. Couldn’t get it to boot with graphics cards older than the W5500. I didn’t look into this too much, could be user error

Booting & OS:

  1. As forum members will attest, Xen is tricky at the best of times, and there seems to be multiple incompatibilities with TRX50 in he current versions of Xen. I gave up on xen and opnsense for now.
  2. Enabling VA-API on my GPU causes weird crashes. I disabled VA-API hardware decoding
  3. My 990 Pro SSDs are very slow on synthetics for some reason (unsolved forum thread)
  4. Full loads and the Noctua barely keeps it under 95C. Going to try undervolting and reseating the cooler at some point
  5. Sometimes Linux tells me the cpus are running at 7GHz. Linux is lying, I didn’t enable PBO nor did I overclock

Successes

  1. Only the GPU has caused crashes, no CPU crashes
  2. Memory is rock solid
2 Likes

Build runs since 2 days. Monday today I did client work all day(Rendering/lookdev) . No issues except temps once shorty reached 91°C. Averaged at 85°C.
No crashes, nothing. Brutal solid.
If you guys have network and wifi or Bluetooth running please let me know!

  1. Asus trx 50 sage wifi

  2. 7970x

  3. Kingston fury 6400 no expo active yet

  4. Noctua 14s for str5

  5. Corsair hx1500i

  6. No custom cables

  7. Nvida 1070 for testing purposes

  8. No add ins

  9. Q codes no errors but the 30-40

  10. any system logs that you may have of these issues

  11. list of issues
    -both Asus network drivers DONT execute!
    -Bluetooth not working
    -wifi does not even have drivers for win10
    -expo profile on Rams not activated until I know how Guarantees are handled on this boards regarding Expo!

So I will upgrade the case cooling even more to lower temps.
Now I run
2 front
1 rear
2 top
Adding 2 more soon.

  1. photos of build



  1. anything not listed that may be helpful.
    Same as Voltara postet. Won’t touch Expo until Iam certain things are proven to run.

Client work over Expo so to say.

EDIT: My bad - Both Network solutions Work. Its the Wifi that still is a mess, plus the Bluetooth!!

My system has been relatively stable after I was able to get it to boot up. But there are some exceptions, and still some issues that I’ve just discovered.

First issue was getting the computer to turn on with 6 GPUs. It appears that the main thing that unblocked this for me was disabling USB4.

Second issue was that the computer does not like plugging in new monitors, and often makes the system hang. Even just enabling a new monitor, or opening any application that changes the graphics state on the screen, like opening a photo editing software and a video editing software at the same time resulting in a hang. No BSOD, no restart. Just a freeze.

And the last issue that I just noticed is that my NVME write speed is horrible. I have 8 Gen5 m.2 T700s in here, and the read speed is awesome - 11-12GB/s. But the write speed is horrible. Can’t get it above 1.3GB/S. This happened on my last Threadripper, and it ended up just being a ramcache issue - turned off ramcache and my speeds were great. SSD temps are great so it’s not that. Seems like this is a common problem on AMD platforms and it’s usually related to chipset drivers. But I haven’t been able to solve it.

Successes: I have been able to run AI models on the GPUs, benchmark the system in multiple benchmarking apps, etc. And the system seems stable as long as nothing messes with the displays.

System Specs:
CPU: 7985wx
RAM: 512 GB, 8 x 64GB Samsung M321r8ga0bb0-cqkmg
GPUs: 4 x 6000 ada, 2 x 4090
PSUs: 3 x 1600W Seasonic Titanium ATX 3.0
SSD: 8 x T700 (4 in slot 7 via a Hyper M.2 card)
8 monitors of various sizes.

3 Likes

I have a question,
What Bios function would turn off that My USB Mouse keeps charging/lighting when the PC is shut down?
Wasn’t this something like a function called ERP? Not sure on this Board’s overwhelming Bios.

Edit:
Also- We dont have a Windows Powerplan feature for these CPUS?
Like here

@Devinkb I’m curious what has been your experience since Feb? I got this board with the hopes of hooking up 12x 4090’s, and your comment about disabling USB4 finally got me into Windows (so thank you for that), but once in there, the GPUs were constantly reinstalling drivers as if Windows forgot them. I turned off all the possible auto-update settings but it kept doing it. After it would “update” drivers, I tested it and it was benching about 15% slower than my WRX80 board. I thought it could have been the bifurcation cards so I decided to pull the WRX90 and hook up my old WRX80 board to see if it would work.

For context, I have two WRX80’s - one that is running and one that was supposed to be replaced with this WRX90. When I hooked up the WRX80 that was supposed to be replaced the 12 GPUs started working almost instantly with the old WRX80 board. All I did was swap the boards, I changed no other setting or piece of hardware.

I’m not sure if there was another setting, but I feel like I went through every setting in the BIOS and researched the hell out of what each setting does without any luck, so I’m sending it back.

BUT What a frustrating week, I wish I switched them sooner or not even gotten it, I thought I was future-proofing myself…

OH well, but there are not many people running this board so curious how it is living with it for a few months.

1 Like

Stability of my system has been pretty great EXCEPT FOR display resets. Sometimes when I plug in or remove monitors, or certain mode switches make a monitor turn off and back on, the whole system hangs.

The biggest problem I’ve had with the platform so far has been bus activity - every time I increase the amount of traffic on the PCIe lanes, the system starts going super slow. From my debugging it seems to boil down to interrupts and kernel issues, both windows and Nvidia kernels. I would like to assume that some chipset drivers, graphics drivers, and a bios update could fix all of these things. I have another thread about SSD transfer speeds as this has been the easiest way to test for issues, but basically, every time I increase the amount of traffic on the bus, my SSD write speeds start dropping all the way down to 200-300MB/s, and all my cores start lighting up to 30% usage due to interrupt issues and inter-process interrupts. At one point I disabled all the sound devices on my monitors and that cleared up the issue, but then I put an NVMe switch in slot 7 and hooked up 8 SSDs to it, and the problem partially came back. I had to manually start managing hardware interrupt core affinities to get my transfer speeds back.

All painful and has been quite a hassle.

It’s interesting does something similar occurs on AsRock WRX90 board? This board is available for a while but i didn’t saw any complains about it, so it is rock stable or super unpopular / low volume.

I just got the retail asrock wrx90 and if theres a repeatsble test i can run let me know

2 Likes

The three main questions are stability, accidental 500 MHz throttling and I/O performance (NVME, usb 4 and MCIO). Are you planning to make video about your experience with this board and compare it with Asus?

  • ASRock TRX50 WS
  • Threadripper 7960X, water cooled Heatkiller block
  • Kingston Fury Renegade Pro 6400mhz (KF564R32RBE2K4-128) 128GB ECC Registered.
  • Corsair HX1200i (old version)
  • PNY RTX 3090 XL8, water cooled Heatkiller block
  • Mellanox ConnectX-6 LX 25gb dual port NIC
  • ASUS PCIE 4.0 Hyper M2 NVME card with 4x 2TB Solidigm P44 Pro SSDs
  • Optane P905 1.5TB SSDs connected to both U.2 ports
  • Corsair MP700 pcie 5 M.2 NVME boot drive, water cooled.

No issues so far. It booted up first attempt, though it took around 3-4 minutes the very first time which I knew going in would likely be the case to train DDR5 memory. This seems a common thing on many/most DDR5 motherboards so wasnt any big deal to me. I updated the bios to the latest AGESA and it made memory training a bit faster after that when it was required, maybe 1-2 minutes instead of 3-4 from now on. I havent overclocked the CPU yet, just got it running a couple days ago and was all the normal stress tests such as Prime95 short, long and blend for 4 hours each. Y Cruncher a few times, Super Pi a few times, Unigine heaven and superposition and 3D Mark TimeSpy and Firestrike Extreme. Everything is stable so it is time to get into the OC now.





StripedDrive

I honestly didn’t know there were so many people with issues till I saw this thread. Glad I didn’t have any, but it seems most issues are people on the WRX90 platform instead of TRX50 and many also have a lot of GPUs while I only have 1.

5 Likes

How did you get those speeds? Espically that Q1T1 1400-1200 WFT, I need that. I am running the same Asus card with 4 x 4 TB WD Black SN850x and I am nowhere close.
image

1 Like

I’m cheating a little bit sadly :stuck_out_tongue: I think I am around 20GB/s write speed on the drives themselves and I threw a 24GB read and write ram cache on the array which gives those extra high numbers. The PC has a UPS on it and the circuit the PC is on has is on the home battery backup panel as well so a write cache is safe enough. I will run it in that configuration full time so those are the number it will have, not just something taken for benchmark or showoff purposes. lol. I was planning on making a thread yesterday on testing various array setups in Windows showing different performance depending on what software raid you use but didnt have time when a client had me working late. It was interesting to see as much as 10GB/s performance difference just on the chosen software raid even with optimal tuning on each.

I’m not done building this rig, but I have all the parts and have had it running more or less on and off for the past couple of weeks.

Motherboard: WRX90 Sage
CPU: 7955WX
RAM: 4x64 Kingston Enterprise 4800MTs
Cooler: Noctua SP6\w standard dual 140mm
PSU: Seagate 1600W Titanium
GPU: Dual RTX4090 supreme (2 slot water cooled ones)

  • I’ve connected them to two 90cm riser cables
  • I also have a Intel Arch 750 as main head (slot 7)

Add in cards:

  • 2x Gen5 Asus Hyper NVME 16x PCIE (slots 1 and 2) (8x Firecuda 540 2TB sticks)
  • RTX4090 (slot 3)
  • Broadcom MegaRAID SAS 9361-24i (slot 4)
    • 12x HGST 12TB drives
    • 12x Samsung 990 Evo 8TB drives
  • RTX4090 (slot 5)
  • Intel x520 converged (slot 6)
  • Intel Arc 750 (slot 7)
  • 4x Corsair MP700 1TB (in motherboard slots)
  • Intel WiFi 6 + Bluetooth 5.2 Desktop kit

Other stuff:

  • 1x CDROM/DVD rom and 1x BluRay (SATA_1 and _2)
  • ToughArmor MB092VK-B with 2x Kioxia CD6-R (will add two more later, just need to get a Tri mode HBA card first.
  • 2x Greaseweasel cards and floppy drives (Sony and Mitsumi) I’m actually looking for a 5.25" floppy drive if anyone has any working ones. And I’m working on fixing a Shugart drive (8")
  • LTO 7 and LTO 8 tape drives

I’m installing this inside a Thermaltake WP200 chassis (the pedestal + case combo) with a dedicated firewall motherboard on the flip side.

While I haven’t installed everything yet, I have had it running with one RTX and both ASUS Hyper card fully loaded. I’ve tried Debian 12.5 with some issues and Ubuntu 24.04 which seems to run fine. The CPU is a powerhouse.

Also, the RTX card didn’t like being in Slot 7 with the riser, it spews our AER errors complaining about hardware connectivity, reseating didn’t work, but moving it closer to the CPU did the trick (slots 3 and 5 works). Somehow Debian didn’t like to have the NVME cards in slots 1 and 2, but they are the only two slots that support Gen5 bifurcation, so I opted for Ubuntu 24.04 in the end.

1 Like