Yes I agree, I just think might be waiting a long time for microsoft to fix the scheduler.
Since this issue is to do with NUMA is windows also allocating all the memory just from the RAM directly connected to that NUMA node. So also effectively limited to two channels. Might be interesting running the test with the just two DIMMS in the sockets connected to one die. To see if the performanse is the same or lower.
The windows scheduler under NUMA is meant to try where possible to schedule the threads on the NUMA node where the memory has been allocated.
Think I’m repeating Ext3h