I am going to make a collection of this kind of thing:
So Microsoft added processor groups as a way to deal with greater than 64c on a single numa node, but no one bothers to use it vs numa aware code. This was in the falcon video today along with Mark r’s thread image utility.
I found the above thread.
I am also the og person who found the windows scheduler behaves suboptimally if a numa node advertises technically it has no local memory (Intel subnuma clustering is an interesting tangent here).
The above thread is more evidence of bitrot. I can’t see that processor groups was a good solution to this problem, especially now with cxl. The cxl devices seem to be implemented as numa nodes with memory but no local compute, and once again windows’ API seems unfun to deal with here.
Anyone work on Big Compute or hear of problems with scheduling, task management😜, etc?
I know Linux has had some fun experimenting on their side too. Redhats thread tuning guide is pretty enlightening, but I haven’t seen something like that in a windows context since they don’t give you much in the way of tunables.
I’m aware of core and root scheduler options on windows, but those don’t improve the situation.
96c tr seems to have a special option that fakes 3x32c numa nodes to placate windows. Interestingly.
What suboptimal stuff have you seen or experienced??
“Prior to 5/ 2022 we thought it’d be fine to just make up numa nodes for old apps. it Was Not Fine. Processor groups!”