[Solved] Broken ZFS dataset

It appears as if I have a ZFS metadata inconsistency.

Situation:
I have a ZFS dataset (aka filesystem) which now refuses to unmount, have its mountpoint changed, or its content listed.

  • A umount on path yields EINVAL which as I understand happens if the given path is not a mountpoint.
  • A df on the path shows the dataset and the supposed mountpoint but the stats of the parent filesystem.

(notice for the following output: /DataRecover is a symlink to /Data)

#showing actual stats and FS properties as they are supposed to be
$ zfs list Data/enc/DataRecover/media/music/yt
NAME                                        USED  AVAIL     REFER  MOUNTPOINT
Data/enc/DataRecover/media/music/yt  39.2G  2.87T     27.9G  /DataRecover/media/music/yt.redownload/
#confirming with `mount` 
$ mount | grep Data/enc/DataRecover/media/music/yt
Data/enc/DataRecover/media/music/yt on /Data/media/music/yt.redownload type zfs (rw,xattr,noacl)

#Notice the higher "Used" statistic
$ df /Data/media/music/yt.redownload/
Filesystem                                 Size  Used Avail Use% Mounted on
Data/enc/DataRecover/media/music/yt  3.2T  315G  2.9T  10% /Data/media/music/yt.redownload
#Shows no content on the mountpoint as if there's nothing mounted
$ ls /Data/media/music/yt.redownload
#trying to write to the supposedly mounted dataset
$ dd if=/dev/urandom of=/DataRecover/media/music/yt.redownload/test bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 2.27017 s, 1.9 GB/s
$ ls -lh /DataRecover/media/music/yt.redownload/
total 512
-rw-r--r-- 1 celmor celmor 4.0G Jan 21 03:36 test
#size unchanged
$ zfs list Data/enc/DataRecover/media/music/yt
NAME                                        USED  AVAIL     REFER  MOUNTPOINT
Data/enc/DataRecover/media/music/yt  39.2G  2.87T     27.9G  /DataRecover/media/music/yt.redownload/

Operating on this mountpoint reveals inconsistencies:

$ zfs unmount Data/enc/DataRecover/media/music/yt
cannot unmount '/Data/media/music/yt.redownload': unmount failed
$ sudo umount /Data/media/music/yt.redownload
umount: /Data/media/music/yt.redownload: not mounted.
$ sudo strace umount /DataRecover/media/music/yt.redownload |& grep  umount2
umount2("/Data/media/music/yt.redownload", 0) = -1 EINVAL (Invalid argument)

Or demonstrated using Linux native tools:

$ df /DataRecover/media/music/yt.redownload/
Filesystem                                 Size  Used Avail Use% Mounted on
Data/enc/DataRecover/media/music/yt  3.2T  315G  2.9T  10% /Data/media/music/yt.redownload
$ df /DataRecover/media/music
Filesystem                  Size  Used Avail Use% Mounted on
Data/enc/DataRecover/media  3.2T  315G  2.9T  10% /Data/media
$ sudo strace umount /DataRecover/media/music/yt.redownload |& grep  umount2
umount2("/Data/media/music/yt.redownload", 0) = -1 EINVAL (Invalid argument)
$ sudo mount --bind  /DataRecover/media/music/yt.redownload/ link
$ df link
Filesystem                  Size  Used Avail Use% Mounted on
Data/enc/DataRecover/media  3.2T  315G  2.9T  10% /tmp/target

Only bind mounting the supposed mount point somewhere else and quering that (via df) reveals that it was never a mountpoint (giving me schroedingers cat vibes)

Further info:

$ zfs get encryptionroot Data/enc/DataRecover/media/music/yt
NAME                                       PROPERTY        VALUE     SOURCE
Data/enc/DataRecover/media/music/yt  encryptionroot  Data/enc  -
$ zfs get all Data/enc/DataRecover/media/music/yt | grep key
Data/enc/DataRecover/media/music/yt  keylocation            none                                           default
Data/enc/DataRecover/media/music/yt  keyformat              passphrase                                     -
Data/enc/DataRecover/media/music/yt  keystatus              available                                      -
$ zfs get mounted,canmount Data/enc/DataRecover/media/music/yt
NAME                                       PROPERTY  VALUE     SOURCE
Data/enc/DataRecover/media/music/yt  mounted   yes       -
Data/enc/DataRecover/media/music/yt  canmount  on        default

If there’s nothing I’ve missed I’m gonna open an issue on Github for this.

1 Like

Any oopses in the kernel log? Internal zfs may be in an inconsistent state is why it’s being a bit schrodinger

No idea what’s going on there, but can you provide OS and openzfs version?

5.10.2-2-MANJARO
zfs-2.0.0-1
zfs-kmod-2.0.0-1

1 Like

Nothing noteworthy either during the unmount attempts or past.
dmesg -H --level=err prints nothing and grepping for error only yields a few segfaults from user-space applications (like browser and game applications).
The issue persisted across reboot so there gotta be an issue on the disks (meta-) data and not some internal (kernel module) state being at fault.
I’ll try running a scrub if that reveals anything.

I’ve managed to solve it after unmounting all zfs datasets (while killing everything that depended on either /Data or /DataRecover) which allowed me to actually unmount that problematic mountpoint. After changing the mountpoint (replacing “/DataRecover”, which is a symlink, to “/Data”, which is the symlinks target, in he path.) I was able to mount it again and read from its contents.
Weird issue which I had last boot as well but killing all processes currently using the pool apparently solved it (one of those processes was using “/Data/media/music/yt.redownload” as its cwd, not “/DataRecover/media/music/yt.redownload”. killing it solved the unmounting problem).
I’m not quite sure what the cause of the issue was but if anyone else were to experience this I’d recommend either trying to unmount all datasets or importing the pool while having no datasets mounted (via a flag for zpool) which should allow changing the mountpoint freely.

These where the steps I’ve done to fix it, not sure what exactly did it but what really “grinds my gears” was that “/Data/media/video/yt” path being shown as cwd though fuser showing it as “/Data/media/music/yt.redownload”.

$ sudo zfs unmount -a
cannot unmount '/Data/media': unmount failed
cannot unmount '/Data/media/music/yt.redownload': unmount failed
cannot unmount '/Data/media': unmount failed
cannot unmount '/Data': unmount failed
$ fuser -mv /Data/media/music/yt.redownload
                     USER        PID ACCESS COMMAND
/Data/media/music/yt.redownload:
                     root     kernel mount /Data/media/music/yt.redownload
                     celmor    21522 ..c.. bash
                     celmor    21715 ..c.. bash
$ ps -fwp 21522
UID          PID    PPID  C STIME TTY          TIME CMD
celmor     21522   13935  0 Jan17 pts/17   00:00:00 -/bin/bash
$ ls -ld /proc/21522/cwd
lrwxrwxrwx 1 celmor celmor 0 Jan 22 17:22 /proc/21522/cwd -> /Data/media/video/yt
$ fuser -mvk /Data/media/music/yt.redownload
                     USER        PID ACCESS COMMAND
/Data/media/music/yt.redownload:
                     root     kernel mount /Data/media/music/yt.redownload
                     celmor    21522 ..c.. bash
                     celmor    21715 ..c.. bash

$ fuser -mv /Data/media/music/yt.redownload
                     USER        PID ACCESS COMMAND
/Data/media/music/yt.redownload:
                     root     kernel mount /Data/media/music/yt.redownload
$ sudo zfs unmount -a
cannot unmount '/Data/media/music/yt.redownload': unmount failed
$ sudo strace umount /Data/media/music/yt.redownload |& grep umount2
umount2("/Data/media/music/yt.redownload", 0) = 0
$ mount | grep /Data/media/music/yt.redownload
$ sudo zfs unmount -a
$ zfs get mounted -r Data | awk '$3 == "yes"'
$ zfs list | grep Data/enc/DataRecover/media/music/yt
Data/enc/DataRecover/media/music/yt                                   39.2G  2.86T     27.9G  /DataRecover/media/music/yt.redownload/
$ zfs set mountpoint=/Data/media/music/yt.redownload Data/enc/DataRecover/media/music/yt
$ df /Data/media/music/yt.redownload
Filesystem                                 Size  Used Avail Use% Mounted on
Data/enc/DataRecover/media/music/yt  2.9T   28G  2.9T   1% /Data/media/music/yt.redownload
$ sudo zfs unmount -a
$ sudo find /Data -mindepth 1 -type d -empty -delete
$ sudo bash -c 'zfs get canmount -H -t filesystem -o name,value -r Data | awk '\''$2=="on" { print $1 }'\'' | while read -r dataset; do zfs get -H -o name,value mountpoint "$dataset"; done | awk '\''$2~/^\/.*/ { print $1 }'\'' | while read -r dataset; do zfs mount "$dataset"; done'
3 Likes

Mysterious, but possibly just a kink in the new zfs-2.0 release. If you can recreate it, they’d probably appreciate the bug report.

2 Likes