Software Raid Array Dead

Hello,

A few days ago I upgraded my linux server with a hba flashed in IT-Mode. Therefore I could now use 6 instead of 3 HDDs for my software raid. Adding them was easy. I changed raid from raid 5 to raid 6 since I wanted more reliability instead of way to much storage. This also worked great for a few days. So the 6 devices of the array were: sda, sdc, sdd, sde, sdf, sdg.

Here you can see the raid array details:

sudo mdadm -D /dev/md0

fredi@kopflos:~$ sudo mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Raid Level : raid6
Total Devices : 3
Persistence : Superblock is persistent

         State : inactive

Working Devices : 3

          Name : fredi-server:0
          UUID : 7398fd61:691dfb1b:5fc6f9ef:ec087af5
        Events : 174163

Number   Major   Minor   RaidDevice

   -       8       64        -        /dev/sde
   -       8       80        -        /dev/sdf
   -       8       96        -        /dev/sdg

Here for one of the RaidDevices:

sudo mdadm --examine /dev/sdg

fredi@kopflos:~$ sudo mdadm --examine /dev/sdg
/dev/sdg:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 7398fd61:691dfb1b:5fc6f9ef:ec087af5
Name : fredi-server:0
Creation Time : Sat May 7 03:14:52 2022
Raid Level : raid6
Raid Devices : 7

Avail Dev Size : 7813908144 sectors (3.64 TiB 4.00 TB)
Array Size : 19534430720 KiB (18.19 TiB 20.00 TB)
Used Dev Size : 7813772288 sectors (3.64 TiB 4.00 TB)
Data Offset : 129024 sectors
Super Offset : 8 sectors
Unused Space : before=128944 sectors, after=135856 sectors
State : clean
Device UUID : 8b852db5:24b69b7c:26e0819b:87d52e12

Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov 18 16:17:03 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : ac28f386 - correct
Events : 174163

     Layout : left-symmetric
 Chunk Size : 512K

Device Role : Active device 1
Array State : AAAAAA. (‘A’ == active, ‘.’ == missing, ‘R’ == replacing)

Now the raid array is defect. It only shows 3 devices, even though every device is still attached to the computer.

Here you can see a “sudo fdisk -l”

sudo fdisk -l

redi@kopflos:~$ sudo fdisk -l
Disk /dev/loop0: 63.45 MiB, 66531328 bytes, 129944 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/loop1: 63.46 MiB, 66547712 bytes, 129976 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/loop2: 102.98 MiB, 107986944 bytes, 210912 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/loop3: 111.95 MiB, 117387264 bytes, 229272 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/loop4: 40.84 MiB, 42827776 bytes, 83648 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/loop5: 40.86 MiB, 42840064 bytes, 83672 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/nvme0n1: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: Intenso NVME
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 7BB81974-F4B3-47C0-A408-50B03F2296A3

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 2203647 2201600 1G EFI System
/dev/nvme0n1p2 2203648 6397951 4194304 2G Linux filesystem
/dev/nvme0n1p3 6397952 488394751 481996800 229.8G Linux filesystem

Disk /dev/sdd: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFAX-68J
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 9D0FB976-C5E0-4978-96F0-A2AC7092FF57

Device Start End Sectors Size Type
/dev/sdd1 2048 7814035455 7814033408 3.6T Microsoft basic data

Disk /dev/sdc: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFAX-68J
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 771CF5B4-142B-4312-B432-34AA1B7B48EF

Device Start End Sectors Size Type
/dev/sdc1 2048 7814035455 7814033408 3.6T Linux filesystem

Disk /dev/sda: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFAX-68J
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: EA1327A9-004A-413C-BC79-501F7DD11CAD

Device Start End Sectors Size Type
/dev/sda1 2048 7814035455 7814033408 3.6T Microsoft basic data

Disk /dev/sdb: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: ST4000VN008-2DR1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sde: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFZX-68A
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdf: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFZX-68A
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/sdg: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFZX-68A
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/mapper/dm_crypt-0: 229.82 GiB, 246765584384 bytes, 481964032 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/ubuntu–vg-ubuntu–lv: 229.82 GiB, 246763487232 bytes, 481959936 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

What was really odd, was when rebooted today I also found following in the fstab:

part of /etc/fstab

UUID=7c122e64-1320-4cd8-9d81-68c9b8364b90 /media/sda auto noauto 0 0
UUID=b39f2ae9-1e46-4089-9d50-770302ce2a11 /media/sdd1 auto noauto 0 0
UUID=c0dab6a6-b41a-4b7f-95eb-1392c282704a /media/sdc1 auto noauto 0 0

This was really weird since I never put it there. The only software I know of which does this is cockpit (which I sometimes use). But I have never set them up to do this.

Now when inspecting them via “sudo blkid” I find following:

sudo blkid

/dev/mapper/dm_crypt-0: UUID=“0vtwb1-peIK-zWWc-jYqg-2muz-kLvv-YddHpG” TYPE=“LVM2_member”
/dev/mapper/ubuntu–vg-ubuntu–lv: UUID=“f8d77f76-be2c-44b7-840b-6bc1895cec38” BLOCK_SIZE=“4096” TYPE=“ext4”
/dev/sdf: UUID=“7398fd61-691d-fb1b-5fc6-f9efec087af5” UUID_SUB=“671a17bb-d43c-b01d-6e1c-b0dfb6343388” LABEL=“fredi-server:0” TYPE=“linux_raid_member”
/dev/nvme0n1p3: UUID=“b1cabdaf-0009-46a4-b619-5e711e9c3149” TYPE=“crypto_LUKS” PARTUUID=“2923c087-c635-4c31-b8e1-5e5e2e2aa57b”
/dev/nvme0n1p1: UUID=“F956-F6A1” BLOCK_SIZE=“512” TYPE=“vfat” PARTUUID=“4679e65c-6791-48a6-bab1-ad6e5bb1c965”
/dev/nvme0n1p2: UUID=“36630ef1-20f3-4086-bc15-27b57842ce53” BLOCK_SIZE=“4096” TYPE=“ext4” PARTUUID=“86d902da-a5d3-473e-b376-67ede5fe97aa”
/dev/sdd1: UUID=“b39f2ae9-1e46-4089-9d50-770302ce2a11” BLOCK_SIZE=“4096” TYPE=“ext4” PARTLABEL=“Basic data partition” PARTUUID=“1a100628-887e-4865-b3a7-72abf732ed77”
/dev/sdb: LABEL=“orderb_hdd” UUID=“8a151ea4-f774-4b38-93d1-c883bb16d81c” BLOCK_SIZE=“4096” TYPE=“xfs”
/dev/sdg: UUID=“7398fd61-691d-fb1b-5fc6-f9efec087af5” UUID_SUB=“8b852db5-24b6-9b7c-26e0-819b87d52e12” LABEL=“fredi-server:0” TYPE=“linux_raid_member”
/dev/sde: UUID=“7398fd61-691d-fb1b-5fc6-f9efec087af5” UUID_SUB=“772a7951-ac30-6197-f320-d8d4fc3f1198” LABEL=“fredi-server:0” TYPE=“linux_raid_member”
/dev/sdc1: UUID=“c0dab6a6-b41a-4b7f-95eb-1392c282704a” BLOCK_SIZE=“4096” TYPE=“ext4” PARTUUID=“7cc8bbae-1228-4f8d-976f-f3c1bab31748”
/dev/sda1: UUID=“7c122e64-1320-4cd8-9d81-68c9b8364b90” BLOCK_SIZE=“4096” TYPE=“ext4” PARTLABEL=“Basic data partition” PARTUUID=“5de5c4e3-1bf2-4446-8cd7-52fe8a4993c5”
/dev/loop1: TYPE=“squashfs”
/dev/loop4: TYPE=“squashfs”
/dev/loop2: TYPE=“squashfs”
/dev/loop0: TYPE=“squashfs”
/dev/loop5: TYPE=“squashfs”
/dev/loop3: TYPE=“squashfs”

This explains to me why the mdadm assembly did not work, since sda, sdc, sdd were reset?!

What can I try and do now to get my raid array back?

1 Like

Personally, I would check if sda/c1/d1 are mounted, umount them if they are, comment / # out the fstab lines for them, sudo update-grub (or distros specific method) then reboot, and see if the drives were still set wonkey.

I am aware Cockpit has a storage module, but not actually used it.
If the drives are mis-allocted again, then perhaps there is some setting in cockpit to correct.
One could also check dmesg / journalctl -b for any reference to the drives, in case it gives the app which is grabbing them.

If an app has been reformatting the three patitons to the new “basic data” then the superblock may be compromised, and the drives/array might have to be rebuilt from backup.

But, they might still work…

I’ve had issues in the past with drive device names changing and causing problems. You might try recreating the array with devices from /dev/disk/by-id/YOURDRIVE instead of the /dev/sdX you are currently using.

2 Likes

Yes they where mounted. When I realised it I umounted them and also commented the fstab lines. Rebootet aswell and it did not work.

I have Ubuntu 22.04 installed, I ran “sudo update-grub” and rebooted. Nothing changed.

The raid array was created in the terminal not involving cockpit, so its really weird in the first place it just them and mounted them (if it even was cockpit…). When rebooting now, the drives are not automatically mounted. So removing the lines from fstab fixed the (sub)problem.

I think the superblock is compromised and this is the problem. Since multiple mdadm assemblies and other commands did nothing and couldn’t find any mdadm data on the drives.

For example if I try “sudo mdadm --examine /dev/sd[a,c,d,e,f,g]” it returns just the MBR MAGIC for the problematic drives.

MORE DETAILS

fredi@kopflos:~$ sudo mdadm --examine /dev/sd[a,c,d,e,f,g]
/dev/sda:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sdc:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sdd:
MBR Magic : aa55
Partition[0] : 4294967295 sectors at 1 (type ee)
/dev/sde:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 7398fd61:691dfb1b:5fc6f9ef:ec087af5
Name : fredi-server:0
Creation Time : Sat May 7 03:14:52 2022
Raid Level : raid6
Raid Devices : 7

Avail Dev Size : 7813908144 sectors (3.64 TiB 4.00 TB)
Array Size : 19534430720 KiB (18.19 TiB 20.00 TB)
Used Dev Size : 7813772288 sectors (3.64 TiB 4.00 TB)
Data Offset : 129024 sectors
Super Offset : 8 sectors
Unused Space : before=128944 sectors, after=135856 sectors
State : clean
Device UUID : 772a7951:ac306197:f320d8d4:fc3f1198

Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov 18 16:17:03 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : 2272be3c - correct
Events : 174163

     Layout : left-symmetric
 Chunk Size : 512K

Device Role : Active device 0
Array State : AAAAAA. (‘A’ == active, ‘.’ == missing, ‘R’ == replacing)
/dev/sdf:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 7398fd61:691dfb1b:5fc6f9ef:ec087af5
Name : fredi-server:0
Creation Time : Sat May 7 03:14:52 2022
Raid Level : raid6
Raid Devices : 7

Avail Dev Size : 7813908144 sectors (3.64 TiB 4.00 TB)
Array Size : 19534430720 KiB (18.19 TiB 20.00 TB)
Used Dev Size : 7813772288 sectors (3.64 TiB 4.00 TB)
Data Offset : 129024 sectors
Super Offset : 8 sectors
Unused Space : before=128944 sectors, after=135856 sectors
State : clean
Device UUID : 671a17bb:d43cb01d:6e1cb0df:b6343388

Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov 18 16:17:03 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : d59aa8c - correct
Events : 174163

     Layout : left-symmetric
 Chunk Size : 512K

Device Role : Active device 2
Array State : AAAAAA. (‘A’ == active, ‘.’ == missing, ‘R’ == replacing)
/dev/sdg:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : 7398fd61:691dfb1b:5fc6f9ef:ec087af5
Name : fredi-server:0
Creation Time : Sat May 7 03:14:52 2022
Raid Level : raid6
Raid Devices : 7

Avail Dev Size : 7813908144 sectors (3.64 TiB 4.00 TB)
Array Size : 19534430720 KiB (18.19 TiB 20.00 TB)
Used Dev Size : 7813772288 sectors (3.64 TiB 4.00 TB)
Data Offset : 129024 sectors
Super Offset : 8 sectors
Unused Space : before=128944 sectors, after=135856 sectors
State : clean
Device UUID : 8b852db5:24b69b7c:26e0819b:87d52e12

Internal Bitmap : 8 sectors from superblock
Update Time : Sat Nov 18 16:17:03 2023
Bad Block Log : 512 entries available at offset 24 sectors
Checksum : ac28f386 - correct
Events : 174163

     Layout : left-symmetric
 Chunk Size : 512K

Device Role : Active device 1
Array State : AAAAAA. (‘A’ == active, ‘.’ == missing, ‘R’ == replacing)

What about if you don’t pass the temporary device locations?
Just the mdadm examine?

I only ask, because it looked like a mix of whole drives, and partitions of drives, had been used before.
That’s the only reason.

If you check the UUID’s, you’ll see the system has “helpfully” shuffled the device locations for you… [Edit, nope, device names still the same, I mis-read them]

You mean using the UUIDs instead of sda…?

I always get errors, which command would you recommand?

For example I tried:
sudo mdadm --examine --scan --uuid=771CF5B4-142B-4312-B432-34AA1B7B48EF
mdadm: :option --uuid not valid in misc mode

1 Like

I just meant a generic mdadm --examine --scan without any qualifiers. For it to look at all drives for any matching partitions

When running your command I get:

sudo mdadm --examine --scan

fredi@kopflos:~$ sudo mdadm --examine --scan
ARRAY /dev/md/0 metadata=1.2 UUID=7398fd61:691dfb1b:5fc6f9ef:ec087af5 name=fredi-server:0

Welp, sorry to waste your time.
Looks like it does need the /dev/sdd1 or whatever.
I guess I would try /dev/sd[a-g] /dev/SD[a-g]1 for both partitions, and non partitions.

But, I suspect I am probably just wasting your time now

No, thank you for trying to help!

I already tried that. I will try to recreate the array now.
which most guides use as a last resort.

The bit where it said all 6 members are operational, (AAAAAA) instead of AA.A… Or similar, makes it look like they are there, but we are excluding half of them from the view

And it looks like we are only looking at whole devices, not partitions, which is what I was getting at.

But probably barking up the wrong tree

(An A, is where a disk is active, a full stop/period is placed where a missing drive is…)

Re-reading the manages, I think the -Q or --query option before the devices might be closer to what I was thinking.
Even then, the creation would not have been made against partitions; it does look like something else over-wrote 3 of the devices…

Sorry bud

I think I just saved the raid array with creating it again. At least it looks like that:

Summary

fredi@kopflos:~/raidstatus$ sudo mdadm --create --assume-clean --level=6 --raid-devices=6 --size=3906886144 /dev/md0 /dev/sdg /dev/sdf /dev/sde /dev/sda /dev/sdc /dev/sdd
mdadm: /dev/sdg appears to contain an ext2fs file system
size=3907018584K mtime=Sat May 7 03:14:07 2022
mdadm: /dev/sdg appears to be part of a raid array:
level=raid6 devices=7 ctime=Sat May 7 03:14:52 2022
mdadm: /dev/sdf appears to be part of a raid array:
level=raid6 devices=7 ctime=Sat May 7 03:14:52 2022
mdadm: /dev/sde appears to contain an ext2fs file system
size=3907018584K mtime=Sat May 7 03:13:29 2022
mdadm: /dev/sde appears to be part of a raid array:
level=raid6 devices=7 ctime=Sat May 7 03:14:52 2022
mdadm: partition table exists on /dev/sda
mdadm: partition table exists on /dev/sda but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sdc
mdadm: partition table exists on /dev/sdc but will be lost or
meaningless after creating array
mdadm: partition table exists on /dev/sdd
mdadm: partition table exists on /dev/sdd but will be lost or
meaningless after creating array
Continue creating array?
Continue creating array? (y/n) y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
(failed reverse-i-search)`mdstat’: cd Fil^C
fredi@kopflos:~/raidstatus$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdd[5] sdc[4] sda[3] sde[2] sdf[1] sdg[0]
15627544576 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]
bitmap: 0/30 pages [0KB], 65536KB chunk

unused devices:
fredi@kopflos:~/raidstatus$ sudo mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Mon Nov 20 17:37:14 2023
Raid Level : raid6
Array Size : 15627544576 (14.55 TiB 16.00 TB)
Used Dev Size : 3906886144 (3.64 TiB 4.00 TB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent

 Intent Bitmap : Internal

   Update Time : Mon Nov 20 17:37:14 2023
         State : clean 
Active Devices : 6

Working Devices : 6
Failed Devices : 0
Spare Devices : 0

        Layout : left-symmetric
    Chunk Size : 512K

Consistency Policy : bitmap

          Name : kopflos:0  (local to host kopflos)
          UUID : 7d887e38:0a9c5848:2ec76907:bdebabbb
        Events : 0

Number   Major   Minor   RaidDevice State
   0       8       96        0      active sync   /dev/sdg
   1       8       80        1      active sync   /dev/sdf
   2       8       64        2      active sync   /dev/sde
   3       8        0        3      active sync   /dev/sda
   4       8       32        4      active sync   /dev/sdc
   5       8       48        5      active sync   /dev/sdd

Now after I rebooted there is no /dev/md0 anymore. Normally I thought this should save…

Can I create the filesystem without having the fear of deleting / corrupting the old one?