Can't mount specific ZFS dataset with "Invalid argument" error

I’ve set up a backup box with Fedora 39 and OpenZFS, created a natively encrypted pool and pulled some data from my main server. Everything worked until I rebooted to verify if all services come to life as configured and I was met with failed zfs-mount.service. This was not the first reboot since setting up a pool.

Mount consistently fails with Invalid argument error but only only on the dataset I used to hold my pulled backups. It did mount before.

There were no updates in the meantime, this is a stock kernel version from Fedora 39 and zfs-dkms was built once on installation.

root@tao:~# zfs --version
zfs-2.2.2-1
zfs-kmod-2.2.2-1
root@tao:~# uname -a
Linux tao 6.5.6-300.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct  6 19:57:21 UTC 2023 x86_64 GNU/Linux

Backups were synced in raw mode with syncoid and are encrypted with a separate key. They are children of the tao/syncoid/papacamayo dataset and tao/syncoid is the one that fails.

root@tao:~# zfs list
NAME                                          USED  AVAIL  REFER  MOUNTPOINT
tao                                          4.04T   373G   232K  /tao
tao/remote                                    272K   373G   192K  /tao/remote
tao/syncoid                                  4.04T   373G   200K  /tao/syncoid
tao/syncoid/papacamayo                       4.04T   373G   192K  /tao/syncoid/papacamayo
(list of datasets from the main server)

Top and sibling dataset did mount automatically with zfs-mount.service.

root@tao:~# zfs mount
tao                             /tao
tao/remote                      /tao/remote

root@tao:~# zfs get mountpoint tao/syncoid
NAME         PROPERTY    VALUE         SOURCE
tao/syncoid  mountpoint  /tao/syncoid  default

root@tao:~# ls -alsh /tao/syncoid/
total 17K
8.5K drwxr-xr-x. 2 root root 2 Feb 13 02:05 .
8.5K drwxr-xr-x  4 root root 4 Feb 14 20:53 ..

root@tao:~# zfs mount tao/syncoid
cannot mount 'tao/syncoid': Invalid argument

root@tao:~# zfs get canmount tao/syncoid
NAME         PROPERTY  VALUE     SOURCE
tao/syncoid  canmount  on        default

I started a scrub, so far no errors. I previously ran another scrub after replacing a failing drive in one of the vdevs and there were no errors.

What is the problem here and how can I fix it?

perhaps zfs mount -a

wait, while dataset is not mounted/ before mounting it, , try and delete the empty syncoid folder, if the folder exists

then mount dataset

then claim / give ownership of folder, if you need to

1 Like

Unfortunately it’s not the problem, /tao is empty already.

root@tao:/# zfs unmount tao
root@tao:/# ll /tao/
total 0
root@tao:/# zfs mount -a
cannot mount 'tao/syncoid': Invalid argument
root@tao:/# ll /tao/
total 17
drwxr-xr-x 2 root root 2 Feb 14 20:51 remote
drwxr-xr-x 2 root root 2 Feb 17 23:53 syncoid
root@tao:/# ll /tao/syncoid/
total 0

and you are sure there is not a folder, in the tao dataset, with a name syncoid, occupying the mountpoint?

1 Like

I’ve checked, the directory /tao/syncoid only appears when I do zfs mount -a or just zfs mount tao. Note that dataset tao/remote mounts, it’s just empty on the pool.

root@tao:/# zfs unmount -a
root@tao:/# ll tao/
total 0
root@tao:/# ls -alsh tao
total 0
0 drwxr-xr-x. 1 root root   0 Feb 13 00:47 .
0 dr-xr-xr-x. 1 root root 178 Feb 15 13:39 ..
root@tao:/# zfs mount tao
root@tao:/# ls -alsh tao
total 26K
8.5K drwxr-xr-x  4 root root   4 Feb 18 00:09 .
   0 dr-xr-xr-x. 1 root root 178 Feb 15 13:39 ..
8.5K drwxr-xr-x. 2 root root   2 Feb 14 20:52 remote
8.5K drwxr-xr-x  2 root root   2 Feb 18 00:09 syncoid
root@tao:/# ls -alsh tao/*       
tao/remote:
total 17K
8.5K drwxr-xr-x. 2 root root 2 Feb 14 20:52 .
8.5K drwxr-xr-x  4 root root 4 Feb 18 00:09 ..

tao/syncoid:
total 17K
8.5K drwxr-xr-x 2 root root 2 Feb 18 00:09 .
8.5K drwxr-xr-x 4 root root 4 Feb 18 00:09 ..
root@tao:/# zfs mount tao/remote
root@tao:/# ls -alsh tao/*
tao/remote:
total 17K
8.5K drwxr-xr-x 2 root root 2 Feb 14 20:51 .
8.5K drwxr-xr-x 4 root root 4 Feb 18 00:09 ..

tao/syncoid:
total 17K
8.5K drwxr-xr-x 2 root root 2 Feb 18 00:09 .
8.5K drwxr-xr-x 4 root root 4 Feb 18 00:09 ..
root@tao:/# zfs mount tao/syncoid
cannot mount 'tao/syncoid': Invalid argument
root@tao:/# ls -alsh tao/*
tao/remote:
total 17K
8.5K drwxr-xr-x 2 root root 2 Feb 14 20:51 .
8.5K drwxr-xr-x 4 root root 4 Feb 18 00:09 ..

tao/syncoid:
total 17K
8.5K drwxr-xr-x 2 root root 2 Feb 18 00:09 .
8.5K drwxr-xr-x 4 root root 4 Feb 18 00:09 ..

can you change the syncoid mountpoint slightly, just for testing?

1 Like

I experimented a bit in the meantime, apparently tao/syncoid and tao/syncoid/papacamayo got cursed on this cursed server. I can almost hear the Cenobites ringing their bell. I moved datasets around and everything is mountable now.

root@tao:/# zfs create tao/test
root@tao:/# zfs rename tao/syncoid/papacamayo tao/test/papacamayo
root@tao:/# zfs list
NAME                                       USED  AVAIL  REFER  MOUNTPOINT
tao                                       4.04T   373G   228K  /tao
tao/remote                                 272K   373G   192K  /tao/remote
tao/syncoid                                200K   373G   200K  /tao/syncoid
tao/test                                  4.04T   373G   192K  /tao/test
tao/test/papacamayo                       4.04T   373G   192K  /tao/test/papacamayo
tao/test/papacamayo/hdd                   4.04T   373G   340K  /tao/test/papacamayo/hdd
(list of children datasets of hdd)
root@tao:/# zfs mount -a
cannot mount 'tao/syncoid': Invalid argument
cannot mount 'tao/test/papacamayo': Invalid argument
root@tao:/# zfs unmount tao/test
root@tao:/# zfs mount tao/test
root@tao:/# zfs create tao/test2 
root@tao:/# zfs rename tao/test/papacamayo/hdd tao/test2/hdd
root@tao:/# zfs unmount -a
root@tao:/# zfs mount -a
cannot mount 'tao/test/papacamayo': Invalid argument
cannot mount 'tao/syncoid': Invalid argument
root@tao:/# zfs destroy tao/syncoid 
root@tao:/# zfs list
NAME                             USED  AVAIL  REFER  MOUNTPOINT
tao                             4.04T   373G   268K  /tao
tao/remote                       272K   373G   192K  /tao/remote
tao/test                         392K   373G   200K  /tao/test
tao/test/papacamayo              192K   373G   192K  /tao/test/papacamayo
tao/test2                       4.04T   373G   192K  /tao/test2
tao/test2/hdd                   4.04T   373G   340K  /tao/test2/hdd
(list of children datasets of hdd)
root@tao:/# zfs destroy -r tao/test
root@tao:/# zfs list
NAME                             USED  AVAIL  REFER  MOUNTPOINT
tao                             4.04T   373G   252K  /tao
tao/remote                       272K   373G   192K  /tao/remote
tao/test2                       4.04T   373G   192K  /tao/test2
tao/test2/hdd                   4.04T   373G   340K  /tao/test2/hdd
(list of children datasets of hdd)
root@tao:/# zfs unmount -a
root@tao:/# zfs mount -a
root@tao:/#

I’m still not sure what happened, but at least I know how to recover from this.

EDIT: Thank you for the time you spent on helping me.

2 Likes

dunno what was wrong, but well done sticking with it, and not just giving up or whatever :slight_smile:

3 Likes

This post documents what I already wen through while building this box: Cursed server build vent

I will not allow this piece of cursed circuits from hell defeat me!

1 Like

To those interested, I’ve nailed what’s causing the problem.

All datasets created from cockpit-zfs-manager have the same problem. Calling zfs create is fine and the issue is within the dataset itself, not related to the mountpoint. It’s also only on specific combination of kernel/zfs/cockpit-zfs-manager version.

I’ll try to figure out what this plugin is doing under the hood and report the problem on this plugin’s GitHub.

I always disliked SELinux. The difference now is that I hate it with a passion that’s beyond human understanding.

It was the freaking SELinux. The freaking SELinux that was supposedly disabled! Apparently SELINUX=disabled in /etc/selinux/config is not equivalent to actually disabling it in the kernel. This cursed piece of software is still running in the kernel, just with no policy loaded. But it does run and, despite no policy, does interfere with the system.

The real way to disabling it is to run grubby --update-kernel ALL --args selinux=0. Did it, rebooted, everything mounts.

1 Like