Mdadm - I've done a thing, but don't know what I'm doing

Hi y’all,

Running Ubuntu Mate 24.04.1, Kernel 6.8.

I have a RAID 1 array. Have no issues with anything, or at least so I think. I don’t correctly know how to monitor for errors, and I fail to understand the mdadm man page.

I’m an adopter of Linux for 2 years and will still have growing pains. I also have a concussion, and a 2 week old at home, so please bear with my stupidity :slight_smile: .

Here are my results from sudo mdadm --detail

$ sudo mdadm --detail /dev/md126
/dev/md126:
         Container : /dev/md/imsm0, member 0
        Raid Level : raid1
        Array Size : 15625876480 (14.55 TiB 16.00 TB)
     Used Dev Size : 15625876480 (14.55 TiB 16.00 TB)
      Raid Devices : 2
     Total Devices : 2

             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0

Consistency Policy : resync


              UUID : 0b
    Number   Major   Minor   RaidDevice State
       1       8        0        0      active sync   /dev/sda
       0       8       16        1      active sync   /dev/sdb

So, the state is “clean”, but per https://linux.die.net/man/8/mdadm, a drive returning a number of “1” signifies that there is at least one failed device. Huh?

I am trying to determine the following, in my effort to learn and fix where necessary

  1. I also have no idea what major and minor represent, what do these mean
  2. What are the mechanics of how “resync” works. Are there more optimal options for the next time I create a RAID array.
  3. If there is an issue with the array, what are my next steps.
  4. I believe mdadm does parity checks on its own. If so, what are your ways of monitoring the health of your drives/array?

Appreciate the advice in advance.

WWED

md is fun, but with all the other options out there, I’m not sure why you would want md.

I’m not certain where you got the number being 1 is bad. It clearly says the drive is active. It is possible to configure md to email upon the event of an issue. If you have an imap or pop capable email service this is fairly easy to configure.

  1. Major and Minor are kernel related identification. You don’t really need to pay attention to them.
  2. Resync here refers to what is to be done when an unclean shutdown occurs… This applies to the parity data. In your case the drives will realign themselves to match if there isn’t a match. It’s possible one drive may not have the same data because the write occurred more slowly on one drive. There is a log of what was intended to be written and it compares this to both drives to then correct a mismatch. This is likely the desired outcome for a mirror.
  3. You would replace the offending drive immediately to maintain your uptime, raid is not a backup.
  4. You’ve already seen how you could check. If you want more verbosity you can add that with -v. You could also run this command via cron job, grep state, and then run a script to flag another system like uptime kuma if it’s anything other than clean.

In the future you might consider zfs or btrfs mirrors as these filesystems have checksumming to detect and correct bitrot.

4 Likes

I think I understand the confusion about drives returning a 1. You’re seeing the exit status code and thought that applied to the drives.

Command line stuff has status codes for stdout. An exit status is just a way to determine what happened with a program you ran for the purposes of scripting or piping the output to another program. This is not relevant for you running the command in a shell as you will get the direct output, but if you want to do something against that successful run then the input of the next program needs to know if there was a problem before doing things with the data.

Here an exit status of 1 means the command completed successfully and you have disk issues, so you could look for that 1 and run another program to do something about it.

2 Likes

Day one of transitioning from windows a noob googles how to raid in Ubuntu, clicks easiest result. :melting_face:

Well it’s an ok system but one of the more esoteric options at this point. Ubuntu is one of my least favorite distros because of the sheer amount of old information that many seem to rely on for decision making.

The arch wiki is a decent guide on things and generally applies to any distro since arch is more about individual software choices than a predetermined set of software packages.

https://wiki.archlinux.org/title/RAID

3 Likes

Good to know