OP, if your LLM loader is anything like the loader in the Stable Diffusion ecosystem, disk reads are choppy, bursty, small block, and out-of-order.
We’ve been trying to get disk reads deshittified for 2+ years.
Edit: See this comment where strace was used to characterize disk I/O. You might consider a similar exercise to help determine if there’s a throat to choke somewhere on github.
Edit2: Interesting ideas in this thread. biosnoop output is easier to read vs. strace:
sudo /usr/share/bcc/tools/biosnoop -d nvme0n1
TIME(s) COMM PID DISK T SECTOR BYTES LAT(ms)
6.756196 python3 51553 nvme0n1 R 1634512288 16384 0.24
6.756386 python3 51553 nvme0n1 R 1634513632 16384 0.15
6.756646 python3 51553 nvme0n1 R 2596332352 16384 0.21
6.756818 python3 51553 nvme0n1 R 1635159584 16384 0.12
6.756997 python3 51553 nvme0n1 R 1635159648 16384 0.15
6.757176 python3 51553 nvme0n1 R 2596333312 16384 0.15
6.784098 python3 51553 nvme0n1 R 2594759872 16384 0.16
6.784149 python3 51553 nvme0n1 R 2594759904 16384 0.03
6.784695 python3 51553 nvme0n1 R 1635161152 16384 0.12
6.784861 python3 51553 nvme0n1 R 1635161184 16384 0.12
6.785112 python3 51553 nvme0n1 R 1635161536 16384 0.22
6.785442 python3 51553 nvme0n1 R 3140679440 131072 0.30
6.785539 python3 51553 nvme0n1 R 3140679184 131072 0.40
6.785591 python3 51553 nvme0n1 R 3140680976 131072 0.40
6.785628 python3 51553 nvme0n1 R 3140679952 131072 0.48
6.785697 python3 51553 nvme0n1 R 3140678928 131072 0.56
6.785705 python3 51553 nvme0n1 R 3140679696 131072 0.56
6.785735 python3 51553 nvme0n1 R 3140680208 131072 0.58
6.785795 python3 51553 nvme0n1 R 3140681488 131072 0.59
6.785801 python3 51553 nvme0n1 R 3140680464 131072 0.64
6.785896 python3 51553 nvme0n1 R 3140682000 131072 0.69
6.785924 python3 51553 nvme0n1 R 3140683024 131072 0.67
6.785961 python3 51553 nvme0n1 R 3140682512 131072 0.74
6.786033 python3 51553 nvme0n1 R 3140684048 131072 0.76
6.786041 python3 51553 nvme0n1 R 3140687056 32768 0.71
6.786083 python3 51553 nvme0n1 R 3140680720 131072 0.89
6.786084 python3 51553 nvme0n1 R 3140685072 98304 0.77
6.786111 python3 51553 nvme0n1 R 3140686032 131072 0.79
6.786115 python3 51553 nvme0n1 R 3140683536 131072 0.85
6.786157 python3 51553 nvme0n1 R 3140685520 131072 0.84
6.786179 python3 51553 nvme0n1 R 3140681232 131072 0.98
6.786190 python3 51553 nvme0n1 R 3140684560 131072 0.91
6.786222 python3 51553 nvme0n1 R 3140686544 131072 0.89
6.786281 python3 51553 nvme0n1 R 3140682256 131072 1.07
6.786308 python3 51553 nvme0n1 R 3140681744 131072 1.10
6.786352 python3 51553 nvme0n1 R 3140683280 131072 1.09
6.786359 python3 51553 nvme0n1 R 3140682768 131072 1.11
6.786363 python3 51553 nvme0n1 R 3140684304 131072 1.09
Note the SECTOR column. ComfyUI is jumping around reading this freshly-defragmented .safetensors checkpoint. At its worst I’ve seen I/Os capped at 16k. This is actually somewhat improved. Regardless this is capping my reads near ~2GB/s on a device that benches ~3.1GB/s.
The safetensors read technique, once deshittified, is a thing of beauty. Perfectly back-to-back and fully sequential 1M I/Os yielding fully saturated disk subsystems. We had this until the first version of the comfyui-faster-loading plug-in quit working after the Comfy folk re-engineered the model loader.
Anyway, see if your LLM model loader exhibits a similar pathology.