For local model hosting, has anyone experimented with different file systems wrt loading models? I just bought a new SSD for my local LLM system and used the standard ext4 file system, but was wondering if xfs or f2fs might give any performance gains for model storage/loading? My guess is that it wouldn’t make much of a difference until models were exceptionally large, but wondering if anyone has any test data?
Curious if anyone has any input/experience. I know a little about filesystems in general, so dumbing things down would be appreciated.
There’s nothing special about loading models… it’s just a lot of sequential reads so it’s covered by existing figures that show sequential read performance.
Fair enough. Figured I’d ask since filesystem mechanics are a bit of a black box to me. I remembered CXL as being an “expansion” of VRAM and also found this paper about using SSDs, which were in part what led to my question. Was mostly curious if anyone ran the experiment to see if there was it was not any advantages.
The paper is focused on techniques for training models using systems with extremely limited memory capacity for the task.
If you’re just doing inferencing the techniques discussed won’t apply. Ideally you’ll want the entire model in VRAM. Next best is some layers in VRAM and others in system memory often with a considerable performance hit. If you ever get to the point where you have to fall back to SSD during inferencing due to lack of memory then you should probably either get more RAM or use a smaller model because performance will likely be abysmal regardless of the choice of file system.
There’s bound to be considerable confusion with the AI focused marketing craze still going strong so it’s good to question what is and isn’t important for end user AI (inferencing) performance. Generally it’s more memory and more memory bandwidth.
While this is mostly true, here’s a fun fact: many loaders actually do some random reads to try to speed up loading certain model parts, I had issues with that when using S3 mountpoint + torch/safetensors/mmap. Many other folks mention bad performance on networked filesystems, but S3 mountpoint is a worse offender since it flat out errors out with random accesses. See some related issues:
Nonetheless, what you said stands true for local filesystems: performance is pretty much limited by sequential reads, and most filesystems manage to achieve that without issues.