BTRFS Root FS recovery?

Hey all,

So I woke up my main machine today that’s running on Fedora, and after I logged in, X11/GNOME straight up died. It booted me into the console, where systemd was complaining about a service not being able to start, but also that there was an I/O error on the boot drive. So I thought the best course of action was to use sysrq to sync the disks, then force a reboot via sysrq.

When I rebooted, the LSI HBA found my three other disks, but not the BTRFS boot disk. I pulled out the other drives and reconnected the boot drive. It booted, but it dropped into maintenance mode. When it did come back up, I checked dmesg, and when the system tried to mount the root filesystem, it came up with:

BTRFS error (device sdb3): parent transid verify failed on 66931867648 wanted 541837 found 541832
BTRFS: error (device sdb3) in btrfs_replay_log:2500: errno=-5 IO failure (Failed to recover log tree)
BTRFS error (device sdb3: state E): open_ctree failed

I’ve ran btrfs check as a couple of people suggested on reddit, but it seems to spew a variety of errors. I’ve refrained from running it with the --repair option in fear of damaging the filesystem even more.

At this point, all I need to get off of the drive is my home folder and nothing else, so even if I can only get the FS to be read-only, that’s good enough for me. If anyone has any ideas I’d be extremely greatful.

Other than that, I’m OK with reinstalling. I should’ve had a backup. Lesson learned the hard way.

Did you do any lvm changes prior to the error happening?
I think I once had a similiar problem after I’ve tried to setup a lvm mirror or cache device (I don’t remember) for the btrfs filesystem, and then after a reboot the filesystem couldn’t be mounted anymore.

I don’t think I have an LVM configuration on that drive. So no.

You could check if there’s anything just before the error in
ls -lrth /etc/lvm/archive/
but other than that I can’t really say. I’m no BTRFS expert. But I guess it’s maybe a good idea, if you could boot from some usb stick and try to clone the disk / create an image of the disk just to make sure that you have a backup before trying any recovery operations.

It looks like you can attempt data recovery (with btrfs restore), but no guarantees are made on data consistency.

Is your LSI HBA doing write-back caching without a battery backup? According to the above link, the main cause of this issue is a software issue (in the btrfs code in the kernel), or disk flushes not working properly.

There’s no battery, the controller’s built into the board. But I’m not sure about the write-back cache.

btrfs restore seems to restore only the stuff that doesn’t matter.

Is the LSI controller in HBA mode or RAID mode? (Do you need to manually create JBOD arrays for your OS to see the disks?).

HBA. No RAID capabilities.