Return to Level1Techs.com

2990WX Threadripper Performance Regression FIXED on Windows* #threadripper | Level One Techs


#42

Can you try indigo in the 1/2 numa node mode? I can test as well


#43

I’ll will put it on Github later today, source and binaries… still old one on there atm.

Will try Indigo and see what it does on the emulated NUMA.


#44

I have a 2990wx, running 64GB Ram - and am happy to test if you let me know what you’d like to look at.


#45

No worries glad you’re getting exposure.

Just watched the video. Brilliant work thank you so much for the explanation!


#46

@wendell This is pretty cool.

https://www.anandtech.com/show/13853/amd-comments-on-threadripper-2-performance-and-windows-scheduler

You’re geek famous!

It is interesting they won’t say exactly what the issue is even though you aren’t 100% correct. I’m guessing the actual deets are deep in the bowels of copyrighted code.


#47

Likely windows internals above my pay grade.


#48

What? You should have millions of theoretical dollars from these awesome videos you do. You just have to goto the Department of Internet Money to collect.


#49

In this case I think it is like horse shoes and hand grenades. Being close counts.


#50

I really wonder if this is why intel stopped at 28 cores on their 8180… insider knowledge of the windows scheduler that AMD may not have perhaps?


#51

Probably not. Is more likely to be from stuff like manufacturing nodes, heat density, etc


#52

Windows server 19 has this regression. Video soon.


#53

Feature creep.


#54

I have uploaded where it is at currently.

Precomplied binaries are available in x64/Release - I have no idea how well the NUMA detection actually will work, the process configuration window should give an idea of what was detected so I would be interested to see.

The basic premise here is that it tries to spread load evenly across the available cores.

I tested a previous build at managing 2,500 threads per second at <0.5% CPU usage, if benchmarking one can minimize it to the tray, the Listview has some overhead.

With it running on my 8700K, I can run the test 10 times in a row and get an identical score EVERY time, so it for sure is having some impact on reducing collisions, normally my score would bounce around by 15-30 points per run.


#55

Starting “ThreadWrangler.exe” on Win7 64bit results in this:
image

Rough translation: Missing DLL


Manged to get a different error:
image


#56

You must be using Windows 7 ? The application hooks several DPI scaling functions which are only supported in Windows 10 - really did not expect anyone would still be using WIndows 7… I will see if there’s a work around.

It will also require the x64 Visual Studio 2017 redistributable.


#57

Yup, still on Win7.

Got that. Some programm for uni required that.


#58

I have modified the code to late bind the function (GetDpiForMonitor) I suspect is not supported on Windows 7 and use an alternative when it is not detected, might be worth trying again - it may look completely messed up if the DPI scaling is incorrectly calculated.

Download the binary in ThreadWrangler/x64/Release/ again.

Also added the ability to modify the number of processors assigned per thread, new combobox.

For reference, bitmap text - = no assigned, * = assigned, X = ideal processor assigned.

Limit to 2 Cores:

Allow all cores:


#59

Support for CPU / core count is dynamic, so if it incorrectly detects the CPU it will be fairly obvious.

Testing 4x16 cores (ie emulated by overriding)


#60

Works. Thanks.

Curious thing: Cinebench now hits 1499 every run (tested 5 runs), I am really curious about performance change in games.
Might pitch Blizzard an email asking if it was okay to run this while testing in Heroes of the Storm (bad game on Ryzen).


#61

Coming up with an “algorithm” that works correctly when your an outsider (from the scheduling/kernel perspective) is really difficult, the approach is hacky and less than ideal but the premise is to give one some knobs to twist/turn and experiment.

Point here is something like Cinebench or Indigo is much easier as it will always load all the cores, something like a game the load is dynamic and it’s possible this may cause ThreadWrangler to make different decisions periodically - how that will manifest I’m not sure… but of course I factored this in and experimented alot with various games/benchmarks to check the assignment decisions stayed sane under most scenarios.

From a perspective of wither or not it’s safe to run on games - I make the note that one should be cautious just for the sake of being responsible… ThreadWrangler does not inject any code, create any hooks or modify anything other than calling system API’s SetThreadIdealProcessorEx and SetGroupAffinity … I would think it safe but I am not going to make any assurances for obvious reasons.