Why am I getting stale file handles for a nfs share

Sorry I’m not an avid forum user and as such I’m kinda expecting that this topic is in the wrong place, apologies for that and please tell me what to do if I can fix it.

Okay, I have an ubuntu 18.04 server with NFS file shares set up. And I have mounted these shares (over a timespan of months) using: fstab, autofs and systemd-automount. And everytime I will get a

mount.nfs: Stale file handle error

message after a while of using the share. It is driving me crazy and has forced me to switch back to samba which makes me feel like barbarian as my client is kubuntu 18.04.

Why am I getting a stale file handle error when I’m just using the fileshare and how can I fix this?

relevant mount entry: (this seems to be the one from autofs)

192.168.20.72:/mnt/datastorev2/data/<name> on /nfs/datastorev2 type nfs4 (rw,nosuid,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.20.176,local_lock=none,addr=192.168.20.72,user=<name>)

relevant /etc/fstab/ configurations (of course not enabled at the same time):

192.168.20.72:/mnt/datastorev2/data/<name> /media/kaoru/datastorev2/ nfs rw,noauto,user,_netdev,bg 0 0
192.168.20.72:/mnt/datastorev2/data/<name> /media/kaoru/datastorev2/ nfs noauto,x-systemd.automount,x-systemd.device-timeout=10,timeo=14,x-systemd.idle-timeout=1min,user,vers=4 0 0

and the contents of auto.nfs:

datastorev2 -fstype=nfs,soft,intr,rsize=8192,wsize=8192,nosuid,tcp,user=<username>,group=<user's groupname> 192.168.20.72:/mnt/datastorev2/data/<user>

and finally the hosts /etc/exports:

mnt/datastorev2/data/<name> 192.168.20.*(rw,sync,no_subtree_check)

The datastorev2 is a ZFS of 4 disks. (twice a mirrored set of disks.)

2 Likes

Can you post relevant entry from mount?

Are you having issue with rpc.bind?
are the rpc services working?

I’ll post the mount entry at the end of my day (4 hrs)

I have no working knowledge of rpc.bind and as such can’t answer if that’s the problem or even if they are working. (A quick read of the manpage didn’t help me either)

Would it be useful to add the relevant entries inside /etc/exports and /etc/fstab?

Couldn’t hurt

posted with additional info.

There’s quite a lot of config there and I’m not familiar with all of it. As a baseline can you see if the issue persists if you set the fstab mount options to defaults (without autofs).

I usually use defaults unless I have issues. Usually you will see any complaints in dmesg.

@trick2011
You could try to manually mount the share using the --nolock option to see if that mounts with no issue as well. If rpc.bind is an issue, then this is a way around it. However, files can be edited from under any application that is reading the file since the file lock will no longer be initiated.

Generally, if you intend the share to always be enabled, then yes, you want to add it to the fstab. If you need it available at boot time, then you will want to add it to the exports. Generally, this will tell the init system to wait until the share is online before giving you login and wait until the share is unmounted before shutting the system down/powering off.

I’ll be testing the connection using a manual mount and with all other mounting methods disabled. But I’ll be away from my pc till monday at least so the update will take a bit.

Edit:
I posted this and did a ls for fun just before I left and:

<username>@<client>:/media/kaoru$ ls datastorev2                                                                                              
ls: cannot access 'datastorev2': Stale file handle

I did dmesg on the client and… nothing in relation to nfs or networking etc… ls -h -all /media/kaoru returned:

d?????????  ? ?      ?         ?            ? datastorev2

dmesg on the server gave:

[198658.245062] nfsd: last server has exited, flushing export cache
[198658.251127] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
[198658.251142] NFSD: starting 90-second grace period (net f00000a1)

It was resolved by doing:
on server:

sudo exportfs -a
sudo service nfs-server reload

on client:

sudo umount /media/kaoru/datastorev2
sudo mount 192.168.20.72:/mnt/datastorev2/data/<name> /media/kaoru/datastorev2/

I admit that I was kinda stupid by doing the server stuff first and not first umounting and mounting. So I’ll try that order next time.

1 Like

All right another update. I’ve used the connection this way for several days now and it keeps giving me the stale file handle message. When I umount and then mount the shares the problem resolves but I’m still no further with my explanation as to why it happens and how I could fix it.

Any suggestions?

I am out of ideas honestly.

what version nfs does the server report?

The output of rpcinfo -p localhost is:

program    vers     proto   port    service
100000        3       tcp   2049    nfs
100000        4       tcp   2049    nfs
100000        3       udp   2049    nfs

It is also a thing that my manual mount sometimes get rejected by the server. But only sometimes.

Is there a reason why you are no exclusively using version 4 or version 4.5? Sometimes mixing major versions created issues due to feature sets. It would also be recommend explicitly stating which version of nfs to use in the fstab (if you have your nfs mounts defined there)

I have no specific reason for versions. I mounted it using -o v3 and no change. Still getting stale file handles.

1 Like

What media/filesystem are you sharing from on the server?

A ZFS pool. 4 disks.

pool datastorev2
mirror-0
disk 0
disk 1
mirror-1
disk 2
disk 3

These are 10 tb disks and are used for general storage and has transmission pointed towards it.

I’m now also getting sporadic permission denied messages on the manual mount.

I have found out that rpcdebug is the program to enable/disable the debug logging for nfs. As such I’ll be trying to get the relevant debug information out of the server and client about the interactions.

The output of rpcdebug -vh returns:

nfs        vfs dircache lookupcache pagecache proc xdr file root callback client mount fscache pnfs pnfs_ld state all
nfsd       sock fh export svc proc fileop auth repcache xdr lockd all

Seems to me that on the server export fileop auth should be logged and on the client vfs dircache lookupcache pagecache file mount fscache state. But your guidance on this matter is much appreiciated.