Hi there, I am currently upgrading my workstation to make it fit to work with AI. Loading large LLM models took a while because my current SSDs can only read about 2GB/s. I was looking to improve this speed drastically and got four 1TB WD SN850X with PCIe4.0.
My setup:
Processor: AMD Threadripper Pro 3995WX
SSDs: four 1TB Western Digital SN850X with PCIe4.0x4 each.
Kernel: 6.12
I put the four cards on a bifurcation adapter all in one slot. Each disk is recognized by the host and each disk is able to read between 6GB/s and 6.5GB/s individually. I have tested this with the gnome-disk-utility as well as with kdiskmark.
I have created a Raid-0 setup over all for devices with mdadm (without any performance adjustments that might offer) and tested the read speed of the array. Testing the read performance of the array I get about 12.0GB/s sequentially.
The problem is this is about half the speed I was expecting out of this and I am wondering if I am running into any kinds of bottlenecks here and if those can be mitigated to come closer to the speed of about 20GB/s that I was hoping to archive.
There is currently no important data on the array so it can be destroyed and recreated at will. Any thoughts and ideas are very appreciated.
PCIe4 is 2GB/s per lane, so if your bifurfaction adapter is also only 8x then you can hit max 16GB/s.
If you are testing each disk individually, after each other, then it won’t saturate it. You could benchmark all of them at the same time and see if that is the same as the RAID benchmark.
It is a x16 bifurcation adapter, each SSD is connected by PCIe4.0x4. I confirmed using fio that all of the disks can do 7GB/s sequential reads in parallel.
Oh nevermind, I think it is in fact working correctly and I am running into the single thread speeds of my processor. I started fio with the following command, that uses 4 jobs in parallel, and now I am seeing speeds of 27.4GB/s.
You might be getting close to RAM limits? I am unsure how to measure this, but DDR4 at its lowend is sub 20GB/s for a single channel as per this guide on ServeTheHome
You might also be running into CPU limits depending on how you’re invoking kdiskmark
Yes with a single job the maximum speed is about 14GB/s with two jobs it is 27.4GB/s, I figure mystery solved. I either need a faster processor or I need to distribute the reads to different cores.