PSA: Ext4 data corruption bug related to direct IO writes, since Linux Kernel v6.1.64

Looks like a regression was introduced since kernel v6.1.64 that can potentially cause data corruption with Ext4. It’s not clear to me under what circumstances or how badly this bug will express, but seems to require writes using direct IO.

https://lore.kernel.org/stable/20231205122122.dfhhoaswsfscuhc3@quack3/

So I’ve got back to this and the failure is a subtle interaction between
iomap code and ext4 code. In particular that fact that commit 936e114a245b6
(“iomap: update ki_pos a little later in iomap_dio_complete”) is not in
stable causes that file position is not updated after direct IO write and
thus we direct IO writes are ending in wrong locations effectively
corrupting data. The subtle detail is that before this commit if ->end_io
handler returns non-zero value (which the new ext4 ->end_io handler does),
file pos doesn’t get updated, after this commit it doesn’t get updated only
if the return value is < 0.

The commit got merged in 6.5-rc1 so all stable kernels that have
91562895f803 (“ext4: properly sync file size update after O_SYNC direct
IO”) before 6.5 are corrupting data - I’ve noticed at least 6.1 is still
carrying the problematic commit. Greg, please take out the commit from all
stable kernels before 6.5 as soon as possible, we’ll figure out proper
backport once user data are not being corrupted anymore. Thanks!

https://lore.kernel.org/stable/[email protected]/

Thanks a lot for the update.
Turns out this is causing a regression in chromeos-6.1, and reverting the
offending patch fixes the problem. I suspect anyone running v6.1.64+ may
have a problem.
Guenter

Corrections and updates welcome.

7 Likes

Thanks for the information.

I’ve read your PSA when I just upgraded to kernel 6.1.64 on Debian. Doh! :slight_smile:
So rollback is done on my systems.

Update:
I’ve noticed that the fix is now available on Debian stable:

root@zephir:~# apt policy linux-image-amd64
linux-image-amd64:
Installed: 6.1.66-1
Candidate: 6.1.66-1
Version table:
6.5.10-1~bpo12+1 100
100 IT Services Group | Department of Physics | ETH Zurich bookworm-backports/main amd64 Packages
*** 6.1.66-1 500
500 IT Services Group | Department of Physics | ETH Zurich bookworm/main amd64 Packages
100 /var_timeshift/lib/dpkg/status
6.1.52-1 500
500 Index of /debian-security bookworm-security/main amd64 Packages
root@zephir:~# zcat /usr/share/doc/linux-image-amd64/changelog.gz | grep ‘#1057843’ -B1
- iomap: update ki_pos a little later in iomap_dio_complete
(Closes: #1057843)

2 Likes