I have a SuperMicro X10SRH-CF board (UP, based on the C612 chipset) configured with a 12-core Xeon E5-2678 v3 processor and four DIMMs in the proper slots to activate all the channels. I enabled Cluster-on-Die in the BIOS, which to the best of my understanding should expose two NUMA nodes to the kernel–each with six cores and two memory channels. Unfortunately, I can only see a single NUMA node with all 12 cores and all the RAM.
I spent a couple hours yesterday trying every conceivable permutation of BIOS settings that seem related to NUMA, interleaving, snoop, etc. No combination of options exposes more than one node to the kernel. The motherboard manual specifically calls out settings for NUMA and node interleaving… but I don’t actually see those in the latest BIOS where they’re supposed to be.
I’m trying to figure out where to go from here. I know my CPU is a strange OEM part and it’s definitely possible that it has some features fused off. It’s also possible that I am completely misunderstanding how this is supposed to work and appear from within Linux… maybe the kernel doesn’t support “little” NUMA? Or maybe I just need to reach out to SuperMicro and try to get some support.
Thank you in advance for any insight you can offer!
Well, you might want to try a different OS to see if it’s something specific to what you’re trying to run. My understanding is that Linux has had support for NUMA domains for a while now but you never know… I don’t think CoD would show up any differently than having two separate sockets on the system.
Clearly, based on the responses I’m getting to my probing! I’m starting to feel like Don Quixote.
But no… I don’t think so. It’s less of a use-case and more of an “I want my hardware to expose an accurate map of its internals so that my operating system and applications can make educated decisions about processor and memory allocation.” Architecturally, this CPU appears to have two somewhat-independent clusters of cores, cache, and memory controllers; the best way to utilize those resources would be to enable NUMA and let the kernel do its thing.
My actual, immediate use-case is virtualization, and the ability to fence a VM within its own NUMA node has huge performance implications. More practically speaking, I have a rack full of very similar hardware at work, and a half-dozen other use-cases that I’d like to flesh out… in my homelab! Not in production.
SuperMicro is telling me that my C612-based motherboards with two E5 v3 processors will show me four NUMA nodes, but my C612-based motherboard with one E5 v3 processor will only show me one NUMA node. That just doesn’t make any sense… at least at first blush. If there’s an actual, technical reason why this feature is disabled or not working or whatever, I’d like to be educated. And if it’s something that can be worked around or outright solved, even better!
Anyway, this rant isn’t directed at you, so I’m sorry if it comes across that way. Like I said above, clearly I’m lost in the woods on this one and it’s time to move on. Oh well!
Yeah, I get it but I guess the hardware itself is niche, and especially in single socket configuration - an edge case.
MOST people (I guess?) running Xeon EX-2YYYY parts would be running them with multiple sockets - for VM hosting, etc. where you’d want as many cores as you can afford in a box (as the 2 in the part number is for multi socket support) which would then definitely make the OS aware that it is multiple NUMA modes (as it knows there is >1 socket populated on the board).
I’ve just been reading up on hyper-v which apparently does expose NUMA layout to VMs running on it, but given these processors seem to be some strange edge case of NUMA for two dies(?) on a single socket… and the issues that AMD had with the early thread ripper and Ryzen parts on windows (i.e, the same sort of issue - Windows didn’t know about the NUMA layout for those parts) it wouldn’t surprise me if it is both niche enough hardware and old enough that nobody cares to fix it any more.
Good luck though, I just suspect that its too old now (it’s what… 7-8 years old?) for vendors to fix this for. Linux might be your best bet.