TrueNAS Scale replication fails with "pool or dataset is busy" error

felix920506 · May 27, 2023, 7:25pm

I am trying to setup replication between 2 TrueNas SCALE hosts. Due to networking limitations, the backup target will be the side initiating the tasks.

The datasets I am trying to backup are all datasets in my Data pool (/mnt/Data)

Whenever the replication tries to start, it fails with this error:

[2023/05/27 10:08:01] INFO     [Thread-33] [zettarepl.paramiko.replication_task__task_1] Connected (version 2.0, client OpenSSH_8.4p1)
[2023/05/27 10:08:01] INFO     [Thread-33] [zettarepl.paramiko.replication_task__task_1] Authentication (publickey) successful!
[2023/05/27 10:08:03] INFO     [replication_task__task_1] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: []
[2023/05/27 10:08:03] INFO     [replication_task__task_1] [zettarepl.replication.run] For replication task 'task_1': doing pull from 'Data' to 'Data' of snapshot='auto-2023-05-27_18-00' incremental_base=None include_intermediate=False receive_resume_token=None encryption=False
[2023/05/27 10:08:03] ERROR    [replication_task__task_1] [zettarepl.replication.run] For task 'task_1' unhandled replication error ExecException(1, "cannot unmount '/var/db/system/rrd-40cc91dacae0491e84781ab81ded8ba4': pool or dataset is busy\n")
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/zettarepl/replication/run.py", line 181, in run_replication_tasks
    retry_contains_partially_complete_state(
  File "/usr/lib/python3/dist-packages/zettarepl/replication/partially_complete_state.py", line 16, in retry_contains_partially_complete_state
    return func()
... 9 more lines ...
    ReplicationProcessRunner(process, monitor).run()
  File "/usr/lib/python3/dist-packages/zettarepl/replication/process_runner.py", line 33, in run
    raise self.process_exception
  File "/usr/lib/python3/dist-packages/zettarepl/replication/process_runner.py", line 37, in _wait_process
    self.replication_process.wait()
  File "/usr/lib/python3/dist-packages/zettarepl/transport/ssh.py", line 159, in wait
    stdout = self.async_exec.wait()
  File "/usr/lib/python3/dist-packages/zettarepl/transport/async_exec_tee.py", line 104, in wait
    raise ExecException(exit_event.returncode, self.output)
zettarepl.transport.interface.ExecException: cannot unmount '/var/db/system/rrd-40cc91dacae0491e84781ab81ded8ba4': pool or dataset is busy

Is there any way to fix this?

Trooper_ish · May 28, 2023, 7:31am

The backup machine does not have it’s copy of the pool mounted, by any chance?

Pulling from the backup machine should be just as good, you are fine with that.

I would manually run a one-off job on the backup machine, to ensure it can pull “manually” then check the app/script.

The mount thing; you don’t want the backup copy to be mounted/snapshotting, else it will fall out of sync, and the copies won’t match and transfer. They’d fork/diverge.

Can be fixed with a Force flag, but is messy, and better avoided?

felix920506 · May 28, 2023, 2:19pm

Update: I move the system data from the nas drives to my array of boot ssds and it has solved the problem

John_Grabner · August 24, 2023, 10:54pm

On the source or destination system?
Any idea on truenas scale how to do this?