I have a hardware RAID 5 array consisting of 4 identical 8TB HDDs. The XFS filesystem has become corrupted. I have no backups of this filesystem and it’s critical that I recover the data in this array.
This array happens to be the single point of failure that is bringing down all of my systems.
My main system cannot boot until this is repaired.
I can boot, and am familiar with a Debian live session.
using the live session, I try xfs_repair, which takes hours just to segfault before completing.
xfs_repair -n seems to work fine and doesn’t segfault.
Attempting to do either takes hours, this is a very large storage array.
Both output obscene amounts of
imap claims in-use inode (number) is free, correcting/would correct imap.
Also getting obscene amounts of
Badness in key lookup (length)
Posting a log of xfs_repair is impossible, the log would be tens of gigabytes long do to how many of those correcting imap lines there are.
I did some googling and all I can find with xfs_repair and segfaults are either really old versions of the program or the underlying RAM has an issue or there isn’t enough of it to hold all the space XFS needs to do its thing.
I would suggest re-seating the RAM and then running a memory test, and then trying again if all looks good.
If it continues to fail perhaps adding more RAM into the system.
When I used to use XFS I seem to recall that xfs_repair would use a lot of RAM to the point of running out and segfaulting. You could try configuring some swap on a spare SSD and check the memory usage while it runs.
If you have some spare disks of at least the same size, I would make an entire copy first incase xfs_repair makes things worse.
Well, I don’t think I’m running out of system memory. I’ve tried using the -m argument to limit RAM usage and I’m getting identical results.
I have nowhere else to store this much data. I don’t own any other storage device capable of holding 32TB.
UPDATE: by continuously trying, i managed to get xfs_repair to work enough for me to get my irreplaceable data. not everything, but enough to not ruin me. i have ordered a new HDD and i will reformat my array once i can get these important bits backed up.
Are you certain there is no physical issue with the array/drives? pv /dev/sda > /dev/null would be a decent way to check if there are lower level problems that you might need to resolve first. The speed at which it runs, as well as whether it completes, will be informative.
Perhaps you need to replace/rebuild a drive in the array. smartctl may be able to report on the health of individual drives to help here. Syntax will depend on your RAID controller, which you did not name. Linux software MD RAID (mdadm) can read/use LSI/MegaRAID signatures on drives on non-RAID controllers, in case that is of help.
Thanks for the advice, both of you.
I built this array years ago.
My controller is an older Dell PERC H830 card. I do not have the CPU headroom for software RAID, so having an accelerator seemed like a logical solution to me.
I cannot remember my original logic behind choosing XFS, but now that I’m forced to reformat anyways, I may as well see if it makes sense to use a different format.
My two options I’m willing to consider are ZFS or EXT4. I’m unsure which is better suited for hardware accelerated RAID-5.
My priority is reliability and redundancy. I want to be able to easy recover from any sudden power failure or system crash or filesystem corruption.
Should my system suddenly crash, I want to only have potential to lose the data being worked on at that moment, and not put the entire filesystem in harm’s way, I don’t want to repeat this here XFS issue.
I’m not sure if Ext4 or ZFS would be a better choice in this regard.
I’m also ignorant as to if things like MDADM have become optimized enough to be negligible and if It might be smarter to use my controller in HBA mode.
Seeking advice from somebody who knows more about storage arrays than myself.
EXT4 will be practically identical to XFS in terms of data recovery. I’ve seen both seppuku themselves, though its almost always due to other hardware issues.
ZFS has a well deserved reputation for being bullet proof, especially when using its own RAIDZ implementation. If I were you I’d switch your controller to HBA mode and use RAIDZ instead.
Though having said all that if you care about your data, you want a backup. Rclone + cloud storage can be a decent way of doing this (b2, wasabi, etc. are cloud storage options). Otherwise plugging in your HDD once a month and copying also works.
I have seen xfs unrecoverable situations at least 10x if not more the ext4 with journaling situations. Ymmv and no it’s not a significant sample as I don’t work for a hyperscaler but still… xfs would not ever cross my mind as an fs for my data
Zfs as you said would be preferred but it will require a different hba or setting the current to it mode
Performance of a software vs hardware solution . … you are doing raid5 using 4 5400 drives … we are talking 150MB/s sequential reads and 50 for sequential writes and 200iops … Software raid is not going to matter…
I/O speed is not the reason I’m apprehensive to use software RAID. I know that either way, the speed of physical HDDs can’t be increased. It’s the CPU load I’d be concerned with.
This specific server has alot going on at any one time, it’s a first gen SP3r2 system. My setup is heavily over-consolidated. There could be several machines and applications accessing this array at any one time. My philosophy is “if everything I own breaks, besides these HDDs, I should be able to completely come back.” I’d hate to lose productivity because something, somewhere, is waiting for this server’s CPU to get around to accessing this array.
I could be wrong, but I believe the presence of hardware accelerated RAID dramatically decreases the likelihood of this happening, as the overburdened CPU only need to worry about PCIe I/O to the card and no actual RAID operations.
Also the battery inside the RAID card, really handy for when the system fails. I have no idea if HBA mode would retain the battery features of the RAID card.
If I had the money, I would love to de-consolidate and have a dedicated machine just for this array. It’s on the shopping list, just below a redundant backup solution.
I know, I’m over reliant on this one array. This situation has opened my eyes and I am now trying to budget for a backup drive to rarely use for my backup drive that I regularly use.
I just mentioned it as a possible testing/recovery option, in case you determine your HBA is possibly at fault, or you need to move the drives into another system for recovery for other reasons.
smartctl -x /dev/sda -d megaraid,$X should allow you to read SMART health data from the drives, where (most likely) X={0…4}. Testing with pv or similar is also a good idea.
Test that assumption before disaster happens
That reasoning seems backwards to me. If your RAID card dies, then you need to get another RAID card of the same manufacturer, possibly the exactly same model and firmware revision, then somehow import or setup the volume config (which isn’t stored on the disks in the case of hardware RAID), without accidentally causing a full resilver of zero-initializing the HDDs.
I’ve had mobos and PSUs die and take out RAID cards, and even when replacing with the exact model, I couldn’t restore the data (thanks, Adaptec).
With MD RAID or ZFS, the RAID topology is redundantly stored on all of the component disks, none of it depends on hardware. You can have a ZFS pool of SATA disks, have the machine blow up, then put half of them in an iSCSI SAN, and half behind a SAS expander, and still import the pool.
Fortunately, none of that is true. You can import a MegaRAID (LSI/Avago/PERC) configuration from a card several generations old into any newer one very easily (import “Foreign config”). And as I said earlier in this thread, you don’t even need a RAID HBA on Linux, as MD RAID will find the MegaRAID signatures on the drives and automatically assemble them into a volume for you.
Way back in the bad old SCSI days, Adaptec RAID controllers in particular were very touchy and often needed firmware updates to import a RAID volume, but with a bit of cursing and frustration, it was always possible to recover. In one bad old case I recall connecting SCSI drives that were in an Adaptec RAID array from a dead system to a working system with an LSI RAID HBA (IIRC) and being able to recover the data that way.