ZFS: checksum error on kernel loading at boot

Hi,

I have an issue with my ZFS home server: grub does not load the kernel and stops with a checksum error.

I followed this guide to install the server, and after some attempts I managed to install the system, but it’s now some time that the server can’t boot.

To rescue the system I boot from a Debian live USB, repeat the steps in the guide to access the zfs filesystem in a chroot and do a grub-update, then reboot and the system starts; but at the next reboot the problem is still there and it doesn’t boot.

A bizarre thing is that there are two bpool pools, one of which is I believe a very early (and failed) attempt:

root@debian:~# zpool import
   pool: bpool
     id: 8990732988331174561
  state: UNAVAIL
  status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
 config:

    bpool                                UNAVAIL  insufficient replicas
      raidz2-0                           UNAVAIL  insufficient replicas
        ata-ST2000DM006-2DM164_Z4ZAMLX9  UNAVAIL  corrupted data
        ata-ST2000DM008-2FR102_WFL353JL  UNAVAIL  corrupted data
        ata-TOSHIBA_HDWA120_857N5DYKS    UNAVAIL  corrupted data
        sdd                              UNAVAIL  corrupted data

   pool: bpool
     id: 2278512205360225035
  state: DEGRADED
 status: The pool was last accessed by another system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
 config:
                                                                                                                                bpool                                      DEGRADED
          raidz2-0                                 DEGRADED
            ata-ST2000DM006-2DM164_Z4ZAMLX9-part3  ONLINE
            ata-ST2000DM006-2DM164_Z4ZAQ5ZH-part3  ONLINE
            ata-ST2000DM008-2FR102_WFL353JL-part3  ONLINE
            ata-TOSHIBA_HDWA120_857N5DYKS-part3    ONLINE
            ata-TOSHIBA_HDWA120_86140KXGS-part3    ONLINE
            /tmp/fake-part3                        OFFLINE

So, does somebody have some idea of what is causing the system to not boot?
Also, how do I get rid of the first bpool? I can’t import because of “invalid vdev configuration”, I can’t destroy it because there are two bpool …

Bye
Andrea

Another strange thing I noticed is that there are two datasets corresponding to mountpoint / and two for /boot , is that normal?

root@atlante:~# zfs list
NAME                            USED  AVAIL     REFER  MOUNTPOINT
bpool                           184M  3,36G      192K  /boot
bpool/BOOT                      182M  3,36G      192K  none
bpool/BOOT/debian               182M  3,36G      153M  /boot
rpool                          1,78T  5,26T      192K  /
rpool/ROOT                     7,11G  5,26T      192K  none
rpool/ROOT/debian              7,11G  5,26T     6,95G  /
[etc...]

This is normal for that guide. Run zfs list -o name,mountpoint,canmount

You should see bpool and rpool have canmount=off set so even though they have a duplicate mountpoint they don’t conflict with each other.

You can import the correct one using the ID (even change its name if you want).
zpool import 2278512205360225035 bpool

The other bpool is probably showing up because something is hanging around in your disk labels that tells zfs the device belongs to a pool. ZFS didn’t overwrite the information becuase you went from whole disk to partition. Now if you try to clear the label on the disk it might destroy your data you want on the disk. I’m not really sure how to go about fixing that or if it would cause your boot issues, but here is some info on it:

1 Like