Asrock Wrx80 Creator & Threadripper 5995wrx issues with resources?

@wendell So i have a Samsung 990 Pro , Seagate firecuda 520 , Kingston Kc3000 drives.

These are all drives meant for faster transfers even with reduced speeds last i checked. The other very important detail to note is that i am struggling with varied loads accross the board outside of just transfering infromation. Anything that demands a lot of cpu horsepower generally seems to be slow.Aside from the transfer being slow, the whole system will stutter and lag really badly. it turns into a flip book.

See if you can paste some screenshots of hwinfo64 while the stutters are happening?

Have you updated the BMC and BIOS to the latest versions? Mine was acting fairly flaky at first with random lock-ups and slow downs. The BMC update seems to be what fixed it.

@LinuxHEDT The bios are on the latest 8.01 driver. I have tried multiple times from different computers on the network to update the BMC and it WILL NOT let me do it. I dont get it…itll start the process then say "uploading and then eventually the session will drop. Not really sure how to resolve this.

@wendell I will get some video and footage uploaded tomorrow.

Thanks for all the help so far guys.

@wendell

Here is a video of me trying to some up some of the issues i have been having. You will see.

On the bit about me talking about Linux, I would only ever use Linux as a test environment. My experience in Linux is limited and Linux would not be able to do everything i need my computer to be able to do unfortunately.

I found some old article, that might give a hint. Though the article is about the non-pro threadripper and Windows 10, some of it might still be an issue under Windows 11 (?). I don’t know

Unfortunately I don’t have windows nor the same hardware to try to reproduce the problem.

Could you try the following:

Enable (numa nodes per socket) NPS4 in Bios. It should be under AMD CBS → DF Common Options → Memory Addressing

and/or disable SMT if there’s an option to disable it, and see if this brings any improvement.

what time index is hwinfo64 ?

That looks like some interrupt / driver issue but it’s probably a bit hard to narrow down with so much stuff running in the background. I think Latencymon can give you an idea what might be causing it though. Resplendence Software - LatencyMon: suitability checker for real-time audio and other tasks

@wendell Here is the Hwinfo video.

1 Like

nice thanks, lots of good info here

can you download magician and see if the 990 firmware is up to date? if not update it?
also post some obs footage of it.

I am going to try to find a 990 pro not in use and setup my system here similar to yours and see if I can reproduce some things.

the power deviation thing isn’t anything to worry about HOWEVER I never saw your cpu use above 170 watts, and the PPT was set for 170. Did you do anything with EDC/PPT or install ryzen master?). That cpu should cruise at 200+ watts all day long under heavy load

(might run cinebench real quick with hwinfo64 open and scroll all the way up and down).

I was also looking for the whea error rate in hwinfo64

samsung magician

I experienced a similar issue with my 32 core Threadripper Pro. My CPU could not go higher than 170W under load and all my file transfers were similar to yours (80-120MB/sec).

OS: Windows 11 Pro 22H2 (22621.2428)
CPU: 5975wx Threadripper Pro (32-core)
MB: Asus Sage WRX80 v1
RAM: 512 GB (8x 64GB Samsung R-DIMM 3200)
GPU: 3 x 4090
SSD: Optane P5800X 1.6GB, Firecuda 530 4TBx2, Hypercard with 4xSN850X 4TB
KVM: Level1Techs

I did the following which seems to have improved things considerably although still not perfect yet.

  1. Disable the BMC in the BIOS.

  2. Make sure IPMI onboard VGA setting is turned off

  3. The ASUS board also has a BIOS setting called “BMC DRAM SMBUS switch” which I changed from BMC to BIOS. This seems to fix the problem with HWInfo being so sluggish and suddenly I was able to getting the temp readings for all 8 DIMMS instead of the 4 previously.

  4. Under load, I would get tremendous slowdowns. However now that HWInfo is detecting all my Ramsticks, I noticed the bottom DIMMS were literally getting “cooked” by my 4090 Suprim X that was installed in slot 1 (65C-90C). To remedy the situation I put a large Noctua fan on top of the GPU and angled it toward the RAM and now my temps are back in the 45-55 range.

  5. The Asus Hypercard that came with the motherboard spams WHEA errors when the PCI Slot is set to 4.0 in the BIOS. Setting it to 3.0 seems to have fixed the problem.

Now my file copies are back to 3GB-4GB /sec

1 Like

@wendell

This is a long rambly video but you will at least see what is continuing to happen even more. When running cinebench i had now whea errors but interestingly after recording this video i tried to do the file transfer thing i was doing and see what hwinfo would say and it kept crashing and erroring out. I will post a video of that as well.

@Blackbox514 I feel like i have seen your post on reddit. I read it in depth and it gave me a lot of hope. Tried a lot of the same steps to know avail. I recognize the 3 4090 gpus.

Now that i have finally updated the bios and bmc on this board i am going to try and disable the bmc and the vga controller if this Asrock board has one. I am using a gigabyte add in card. I was getting these issues without that card connected but i will limit it to 3.0 speeds just to see if that has any meaningful benefit.

@gysi I will also give some of things you suggested a try to as far as numa nodes are concerned.

Ok, tomorrow I build :smiley: this system will rock once we root cause this

1 Like

@wendell I look forward to your findings! I will say if you really want to replicate it make sure that you have another drive in the secondary nvme ssd on the Asrock motherboard. This way you have two drives to transfer drives too.

Here is that video showcasing Hwinfo not responding upon file transfers.

Is your 4090 installed in slot 1 (topmost near ram)?

I’ve got a Suprim X 4090 in slot 1 so I removed the extra fan I had added to blow air on the RAM DIMMs to see what would happen. I started a 10 min Cinebench R24 test and I repeatedly started some large file transfers from my Optane P5800 to itself and watched my RAM temps creep up all the way up to 81C with the lowest temp at around 70C. My file transfers literally went from 3-4GB/sec to less than 100MB/sec.

Note that HWInfo didn’t report any WHEA errors and the Ram temps were not highlighted in Red.

Interestingly, I then re-added the fan and waited for the memory temps to get back to around 50C. I initiated the same large file copy’s but they were still locked at around 100MB/sec.

It’s almost as if some type of “thermal protection” kicked in.

I had to reboot and keep the extra fan going to “unstick” whatever was throttling the transfers and everything is good again.

1 Like

I aswell have the Surprim X 4090 . It is not in the top most slot actually its in the third. Not sure how much you ahve read into this @Blackbox514 . Just so were clear, while i had the Asus Wrx80 Sage board for testing, i currently have the ASrock Creator 2.0 board. My orignal Choice was the og ASrock creator board with the intel nics ( Same board as the 2.0 just intel nics).

The asus Sage motehrboad lacks thunderbolt and therefore is not a suitable candidate for me. It is unfortuate because it is a beautifully built motehrboard. That said the asrock is feature rich and ultimately lets me take my system much further then the Asus can.

As far as transfer speeds are concnered, i do not seem to be having any issues with heat. If you look at my temps everything seems to be fine. I can look into my ram temps a bit more thoroughly but even with only 2 sticks in my system i am seeing this issue. I was able to replicate it with 2 nvmes and a 1030 only connected into the machine with 1 stick of ram or 2 or 4 ect.

Aside form your slow transfer rates you were having were other elements of your system slow when commencing these file transfers as it has been for me?

Yeah, I had all kinds of issues, most of which I was able to tweak using tools like Process Lasso.

However, things took a turn for the worst after I upgraded from 8x32GB (256GB) to 8x64GB (512GB) Samsung R-DIMMs and added the two extra GPUs last month. The timing happened to coincide with the new cinebench R24 release, so I decided to give it a spin. Low and behold my Multi CPU score was less than a M1 mac at around 1600 the first time I ran it. It then got as low as 220 since I rarely reboot my computer. That’s around the time I noticed my filetransfers got very slow so I knew something was wrong.

Now I get a much more reasonable 2800+ Multi and over 90K GPU.

Depending on your computer case and airflow, I wouldn’t discount that your RAM sticks could still be getting overly hot, especially if you are using 128GB @ 3200 R-Dimms. From what I’ve read on this forum, overly hot for Threadripper Pro 5000 is 60-70C.

1 Like

ok, so setup

asrock creator w/x710 nic
drivers installed
bmc enabled

bios 6.06 at bone stock defaults

a measly 64gb of udimms (for now, I have 256gb and 512gb rdimms laying here)

I used obs to screen record copying 200gb back and forth back and forth between a solidigm P44 Pro and a samsung 980 Pro. The 980 will not slow down as quickly as the 990 pro… I am still suffling data to be able to use a 990 pro for testing.

both m.2 at the front edge of the motherboard are populated.

gpu in 3rd slot. No capture cards or anything installed though. its a 7900xtx

what other hardware is where so I can make my setup more like yours?

TODO: screenshot of memory tab from cpu-z from your machine

I kind of want to change your bios settings for the CPU from nps1 to nps4 first to see if that does anything. This might be a windows thing for 64c CPUs. I don’t actually have a 64c tr5000

Mine is nps1 like yours is right now. In task Manager where you changed the graph to individual cores numa nodes is grayed out right?

3 Likes

Yes, Numa nodes is grayed out for me.
Thank you so much for doing all of this @wendell . Alright so my setup is as follows for slots.
Topmost slot is a Nvidia gt 1030
Secondary slot is a 4nvme add in card set to 4x4x4x4 in bios. No raid. just more singular stand alone drives.
Thrids slot is my Nvidia gtx 4090 (takes 2 slots)
5th slot is a pcie Extension cable for a Nvidia 3060 ti gpu.
6th slot is a cam link pro 4k .
slot 7 is a sonnet 8 port usb 3.0 card.
I have 256 Gb of trident z neo ram in the computer as well.

I changed to Nps4 and it didn’t improve much of anything but more experimentation can certainly be done.