2990WX Threadripper Performance Regression FIXED on Windows* #threadripper | Level One Techs

Michael Larabe had already shared Windows server 2019 results and it does appear to have the same issue. Once the threads are over 32 you can clearly see performance degrades in Windows compared to Linux. I would have thought Microsoft would have addressed this already in the server space. This may be a really big find if it forces Microsoft to update its kernel.

https://www.phoronix.com/scan.php?page=article&item=2990wx-linwin-scale&num=4

I wonder if this can be replicated in a 4 node xeon server. The memory timings might be too slow to actually notice.

https://www.indigorenderer.com/benchmark-results?filter=cpu

There’s a 2990 in the top 5 for IndigoBench CPU running Windows 10 all of the sudden .

3 Likes

I think the starting idea is sound - letting high-level set ideal cores for each thread - especially if cores aren’t equal in memory latencies and the scheduler is aware of the CPU topology and the relative requirements of the thread. Where it breaks is when only a portion of cores are ever assigned as ideal cores.

My guess: One part of the scheduler sees some cores underutilized and others loaded with 100% and multiple threads, so moves a thread. Another part of the scheduler sees a thread running in a core that’s not its ideal one and moves it back.

In a truly pathological case the scheduler moves the thread back very quickly, leaving no time for caches to update and provide any data, so the thread doesn’t manage to have any work done during its detour.

I’m curious and probably missing the mark here but while this was specific for the threadripper, but would this tool be of any benefit for those on ryzen?

2 Likes

The system next to the whiteboard at 20:12, looks familiar.

Those heatsink+fan modules and the Supermicro SC732, could that be @wendell’s Talos II?

1 Like

With virtual machines how many processes are running on an actual 32 thread box, in the enterprise, on Windows.

Also this is the first time in quite a while that Intel has had any real competition in the enterprise so its quite possible no one really noticed the degradation.

2 Likes

I have refactored it to now support Ideal CPU allocation and is fully NUMA Node aware.

Was able to use bcdedit to emulate NUMA node’s - appears MS supports this for driver development purposes (mostly) … 8700K ends up becoming 2 NUMA nodes, with 4 cores per node (2x HT, 2x Real)

It keeps track of (counters) and allocates in a round robin basis “Ideal” and “threads” against groups, then cores, algorithm is easily changeable … all the cores for a group get thrown into a std::set with a custom comparator on a allocation request.

There is some difficulty in deciding how it should decide which threads effectively have priority during the core allocation routine (currently happening every second)

The final thing to finish is core/thread stickyness so that it attempts to re-allocate the same core(s) to the same thread the next allocation round.

3 Likes

Can you try indigo in the 1/2 numa node mode? I can test as well

I’ll will put it on Github later today, source and binaries… still old one on there atm.

Will try Indigo and see what it does on the emulated NUMA.

2 Likes

I have a 2990wx, running 64GB Ram - and am happy to test if you let me know what you’d like to look at.

No worries glad you’re getting exposure.

Just watched the video. Brilliant work thank you so much for the explanation!

@wendell This is pretty cool.

https://www.anandtech.com/show/13853/amd-comments-on-threadripper-2-performance-and-windows-scheduler

You’re geek famous!

It is interesting they won’t say exactly what the issue is even though you aren’t 100% correct. I’m guessing the actual deets are deep in the bowels of copyrighted code.

2 Likes

Likely windows internals above my pay grade.

1 Like

What? You should have millions of theoretical dollars from these awesome videos you do. You just have to goto the Department of Internet Money to collect.

In this case I think it is like horse shoes and hand grenades. Being close counts.

I really wonder if this is why intel stopped at 28 cores on their 8180… insider knowledge of the windows scheduler that AMD may not have perhaps?

Probably not. Is more likely to be from stuff like manufacturing nodes, heat density, etc

1 Like

Windows server 19 has this regression. Video soon.

Feature creep.

1 Like