I was wondering about the following: say I have a CPU with 12 cores and 128GB of memory. Each core at the same time will work on a 20GB dataset in memory, so 6 cores would use ‘real memory’ and 6 cores would tap into ‘swap memory’ (250GB memory in total).
Is it better to do everything with 6 cores in this case and not get into swap memory? Because once swap memory is being used it will slow everything down? Or is there no ‘impact’ on the 6 cores using ‘real memory’? If so I was thinking of getting 128GB of e.g. Optane storage and use that as a swap partition to speed up the 6 cores tapping into the swap memory.
You might know this already but swap memory can’t be addressed by the CPU directly. Suppose you have some chunk of data that gets swapped out… If a core tries to access that data, the OS takes over, and reads it back into memory from the disk, and then lets the program continue. Depending on the exact memory access pattern that can be okay if it happens occasionally, but it can result in absolutely horrible performance if the following happens:
Suppose core 1 accesses some data that has been swapped out, but the system RAM is full. In that case, the OS will swap out something else in order to bring the data back into RAM for core 1. If core 2 comes along a few milliseconds later and asks for the data that was just swapped out in order to make space for core 1’s data, it can get bad. If this happens in a loop, core 1 and 2 will spend the next eternity swapping out each other’s data, and very little progress will be made. This is called thrashing, and I’ve had systems become totally unresponsive in this situation (can’t even reach them over SSH - only IPMI or physical access allows you to interact with them).
I don’t know what you’re workload is but it seems likely to me that in the case you describe, just using 6 cores is going to be better.
Swap is quite useful when there’s some background programs eating up memory, but aren’t actively reading/writing that memory.