Hi,
I am running kernel 5.9.15 if that matters. I have a OCD issues with a memory isolation in kernel. I have two numa nodes in my server and I wanted to isolate whole node to vms. Cores is pretty easy to isolate, but the memory is not that easy or I miss some kernel parameters of the memory spaces. I have a dirty work around now.
Kernel Parameters:
isolcpus=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
nohz_full=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
rcu_nocbs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
default_hugepagesz=1G
hugepagesz=1G
hugepages=128
hugepagesz=2M
hugepages=1835
Boot time service:
reserve_pages()
{
echo $1 > $nodes_path/$2/hugepages/hugepages-2048kB/nr_hugepages
echo $1 > $nodes_path/$2/hugepages/hugepages-1048576kB/nr_hugepages
}
reserve_pages 0 node0
Achievement:
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
node 0 size: 193365 MB
node 0 free: 191771 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
node 1 size: 64505 MB
node 1 free: 8 MB
Why / Problems:
-
If there is still free memory avaible on node1 there will be huge performance penalty when process hitting node1 memory pool. Now I have shrinked down that pool to only 8 MB but that is not perfect. Actual I think hugepages allocation vary between boots. Sometimes there is 61 or 62 1 GB huge pages. Mixing 1GB and 2 MB pages fills the space.
-
I have to use kernel parameters with double allocation because it divide for both nodes and then run early boot service to unallocate huge pages from node0.
-
Why there is no default memory policy to isolate node1 memory space from kernel and to default it to the new processes? Biggest problems is the processes that runned by kernel. Example the mdadm.
-
There is no possibility to define numa node for ramfs. Ramfs needed when you want to use ram disk as normal block device. Tmpfs can be defined to numa node but working only in a mount point.