Intel P4510 "I/O timeout, completion polled"

hsnyder · November 14, 2023, 4:30am

I have two Intel DC P4510 drives (8 TB each) that I was formerly using in a Linux RAID 0 array. I noticed that when doing a lot of small file I/O, I would sometimes see hangs where an I/O operation would take an extremely long time to complete, along with some dmesg messages the read some variant of “I/O […] timeout, completion polled”. This is on a Zen2 Epyc system. Looking around the internet a bit, this seems to be a known issue, or at least reported by others. For example: Fixing Slow NVMe Raid Performance on Epyc

I didn’t have time to deal with it, so I just moved on to using different disks for the task I had. A month later, I’m coming back around and trying to figure out what I can do to fix this. I saw Wendell’s thread linked above, and I noticed that both drives had out of date firmware and were formatted to 512b sectors. So I reformatted to 4k, and updated the firmware to revision VDV10184.

I built a new RAID0 array with the drives to see if I could reproduce the issue. I could not.

Good news? The trouble is, the software and data that I was using when I first discovered the issue are no longer available on this computer. I could dig them up but it would be a pain. So my attempts to reproduce the issue been with FIO (various settings). But I never ran FIO before the block size change and firmware update, so it’s hard to know whether the issue is resolved or just not triggered by this workload.

Does anyone know anything further about this specific issue? Was it fixed in a firmware update? If I get some time, maybe on a weekend, I’ll dig up the exact software I was using and try to reproduce the issue that way, but I figured it wouldn’t hurt to ask if anyone here has seen the same issue, and if anything is definitively known about a firmware update to VDV10184 or a change to block size 4k would have any bearing.

wendell · November 15, 2023, 7:54pm

This was the problem Linus had too. The firmware update and sector size thing does fix it.

hsnyder · November 16, 2023, 5:33am

Thanks, Wendell! Having that confirmed gives me peace of mind.