Hi everyone, I am new to ZFS failure. I want to address my incident and wonder if there are any options available for my pool.
First of all, I’ve been using Truenas vm inside Proxmox, using by-id method.
Below are some of the basic information.
- Truenas scale vm with 16 core, 64GB ecc reg, using tailscale only in app section.
- Single raidz2 vdev, 10 x 16TB sata drives, around 1891 hours, no other vdev or special vdev.
- smartctl short (weekly) passed, smartctl long (after incident) passed
- No uncorrectable or pending sectors occurred while using Truenas
- No sharing or remote service enabled (ssh, smb, nfs)
- No power outages while using Truenas, running about 4 months straight until incident.
Incident happened about a week ago, all of a sudden my pool cannot be imported, I tried to export and import pool again inside Truenas.
And Truenas made an error, ValueError: 2095 is not a valid error
.
ValueError: 2095 is not a valid Error
Error: concurrent.futures.process._RemoteTraceback: “”" Traceback (most recent call last): File “/usr/lib/python3.11/concurrent/futures/process.py”, line 256, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 112, in main_worker res = MIDDLEWARE.run(call_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 46, in _run return self._call(name, serviceobj, methodobj, args, job=job) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 34, in call with Client(f’ws+unix://{MIDDLEWARE_RUN_DIR}/middlewared-internal.sock’, py_exceptions=True) as c: File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 40, in call return methodobj(*params) ^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 181, in nf return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/plugins/zfs/pool_actions.py”, line 207, in import_pool with libzfs.ZFS() as zfs: File “libzfs.pyx”, line 529, in libzfs.ZFS.exit File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs/pool_actions.py", line 227, in import_pool zfs.import_pool(found, pool_name, properties, missing_log=missing_log, any_host=any_host) File “libzfs.pyx”, line 1369, in libzfs.ZFS.import_pool File “libzfs.pyx”, line 1397, in libzfs.ZFS.__import_pool File “libzfs.pyx”, line 658, in libzfs.ZFS.get_error File “/usr/lib/python3.11/enum.py”, line 717, in call return cls.new(cls, value) ^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3.11/enum.py”, line 1133, in new raise ve_exc ValueError: 2095 is not a valid Error “”" The above exception was the direct cause of the following exception: Traceback (most recent call last): File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 427, in run await self.future File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 465, in __run_body rv = await self.method(([self] + args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 177, in nf return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 44, in nf res = await f(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/middlewared/plugins/pool/import_pool.py", line 113, in import_pool await self.middleware.call(‘zfs.pool.import_pool’, guid, opts, any_host, use_cachefile, new_name) File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1399, in call return await self._call( ^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1350, in _call return await self._call_worker(name, prepared_call.args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1356, in callworker return await self.run_in_proc(main_worker, name, args, job) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1267, in run_in_proc return await self.run_in_executor(self.__procpool, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1251, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, args, *kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: 2095 is not a valid Error
I’ve been poking my head for about a week and tried some of the options I found.
zpool import -f
shows ONLINE and make me promising, but there is no pool mounted.
zpool import -f & zpool status
zpool import
pool: hdd10x16t
id: 14448620205443767059
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:hdd10x16t ONLINE
raidz2-0 ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZEA4 ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZD7P ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZESV ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZDE6 ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZE7G ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZDMX ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZD05 ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZED5 ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZEEA ONLINE
ata-ST16000NM001J-2TW113_ZRS0ZE3T ONLINE
zpool status
no pools available
zpool import -f hdd10x16t
returns insufficient replicas
zpool import hdd10x16t -f
zpool import hdd10x16t -f
cannot import ‘hdd10x16t’: insufficient replicas
Destroy and re-create the pool from a backup source.
zdb -l hdd10x16t
checked if by-id or devid changed. It returns not changed. (compared/dev/disk/by-id
)
zdb -l hdd10x16t
zdb -l hdd10x16t
LABEL 0
version: 5000
name: ‘hdd10x16t’
state: 0
txg: 1268725
pool_guid: 14448620205443767059
errata: 0
hostid: 2285398396
hostname: ‘recode-hetzner10x16t’
top_guid: 3320680006278793367
guid: 9307791802452327806
vdev_children: 1
vdev_tree:
type: ‘raidz’
id: 0
guid: 3320680006278793367
nparity: 2
metaslab_array: 256
metaslab_shift: 34
ashift: 12
asize: 160008854568960
is_log: 0
create_txg: 4
children[0]:
type: ‘disk’
id: 0
guid: 9307791802452327806
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZEA4-part1’
phys_path: ‘pci-0000:25:00.0-ata-4.0’
whole_disk: 1
DTL: 159
create_txg: 4
children[1]:
type: ‘disk’
id: 1
guid: 5722856747687887882
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZD7P-part1’
phys_path: ‘pci-0000:01:00.0-ata-1.0’
whole_disk: 1
DTL: 158
create_txg: 4
children[2]:
type: ‘disk’
id: 2
guid: 12515852929378121397
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZESV-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZESV-part1’
phys_path: ‘pci-0000:01:00.0-ata-2.0’
whole_disk: 1
DTL: 155
create_txg: 4
children[3]:
type: ‘disk’
id: 3
guid: 5347527874058330893
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZDE6-part1’
phys_path: ‘pci-0000:01:00.0-ata-3.0’
whole_disk: 1
DTL: 154
create_txg: 4
children[4]:
type: ‘disk’
id: 4
guid: 16523393813715497135
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZE7G-part1’
phys_path: ‘pci-0000:01:00.0-ata-4.0’
whole_disk: 1
DTL: 153
create_txg: 4
children[5]:
type: ‘disk’
id: 5
guid: 846333880863613494
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZDMX-part1’
phys_path: ‘pci-0000:02:00.1-ata-1.0’
whole_disk: 1
DTL: 152
create_txg: 4
children[6]:
type: ‘disk’
id: 6
guid: 5875470405935328920
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZD05-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZD05-part1’
phys_path: ‘pci-0000:02:00.1-ata-2.0’
whole_disk: 1
DTL: 151
create_txg: 4
children[7]:
type: ‘disk’
id: 7
guid: 7378420708011026499
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZED5-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZED5-part1’
phys_path: ‘pci-0000:25:00.0-ata-1.0’
whole_disk: 1
DTL: 29336
create_txg: 4
children[8]:
type: ‘disk’
id: 8
guid: 12934111643194929302
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZEEA-part1’
phys_path: ‘pci-0000:25:00.0-ata-2.0’
whole_disk: 1
DTL: 25315
create_txg: 4
children[9]:
type: ‘disk’
id: 9
guid: 7616272590451776413
path: ‘/dev/disk/by-id/ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1’
devid: ‘ata-ST16000NM001J-2TW113_ZRS0ZE3T-part1’
phys_path: ‘pci-0000:25:00.0-ata-3.0’
whole_disk: 1
DTL: 150
create_txg: 4
features_for_read:
com.delphix:hole_birth
com.delphix:embedded_data
com.klarasystems:vdev_zaps_v2
labels = 0 1 2 3
- changed
tvds
andspa
parameters and triedzpool import
again. Shows same output
changing parameter and zpool import again
echo 1 > /sys/module/zfs/parameters/zfs_max_missing_tvds;
echo 0 > /sys/module/zfs/parameters/spa_load_verify_data;
echo 0 > /sys/module/zfs/parameters/spa_load_verify_metadata"
zpool import -o readonly=on hdd10x16t
cannot import ‘hdd10x16t : insufficient replicas
Destroy and re-create the pool from a backup source’
-
passthrough sata controller with iommu and tried
zpool import
and other options above, shows same output.zdb -l hdd10x16t
also gave the same output. -
Tries googled command while panicked
-mfr
,-fF readonly=on
with pool name, but still shows insufficient replica.
After all of this, I wanted to ask if a couple of ideas I have in my head would work.
-
As I think,
zdb -l
output shows create_txg : 4, So…Is there any way to properly grab the txg usingzdb -e
while my pool is unmounted?
I watched the earlier video [TekSyndicate - Adventure in ZFS Data Recovery], but it seems risky, so if any of you have experience with this, I’d appreciate it if you could share. -
There is an article [Recovering Destroyed ZFS Storage Pools - Managing ZFS File Systems in Oracle® Solaris 11.2] In this article, it simply destroy and import the pool. it feels no way home, anyone tried this method?
-
How’s your experience about [Klennet ZFS Recovery]? It feels GUI version of
zdb -e
script with a database, -
Or… Am I doing wrong?
If you have any questions about this, or if I can provide any information, please let me know. I’ll be happy to get right back to that.