Importing my ZFS pool causes a kernel panic

Susanna · November 5, 2023, 2:21am

After a system crash, likely one caused by corrupt memory as I was using non-ECC DIMMs at the time, my zpool no longer imports. Booting from a USB drive and running zpool import -af causes an immediate kernel panic, as, unfortunately, does zpool import -aFf.

However, running zpool import -Fn, the pool does display correctly (and without any reported errors). This suggests to me I may be able to solve the issue by specifying how far I should be rolling back. I am also able to import the pool read-only with zpool import -af -o readonly=on, so maybe the issue is actually something else? I have no reason to doubt any of the disks, and I’ve since moved them all over to my backup NAS (which has ECC memory).
I have backups, and I can just zfs send from the readonly mount to rescue whatever work hasn’t been backed up, so my level of concern here is minimal. But, if it’s possible, obviously I would prefer not to have to recreate the pool, since that means many days of spinning rust slowly ticking away while my main workstation has to boot from USB and do all its work over the backup NAS’ slo-o-ow gigabit NIC. Anyone got some suggestions how I can import the pool read-write so I can do a scrub?

NicKF · November 5, 2023, 2:32am

What do you see when you just type zpool import

It should “show your pool” and tell you “whats wrong”. If you can mount Read Only, it sounds like a top level vdev is dead?

Susanna · November 5, 2023, 2:38am

   pool: idunn
     id: 8659442495485374424
  state: ONLINE
status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
 config:

        idunn                                          ONLINE
          raidz1-0                                     ONLINE
            wwn-0x5000039af8d12fe3                     ONLINE
            wwn-0x5000039a58c98079                     ONLINE
            wwn-0x5000039a68cb99ac                     ONLINE
            wwn-0x5000039a58c98bab                     ONLINE
        special
          mirror-1                                     ONLINE
            nvme-eui.e8238fa6bf530001001b444a48bfbe79  ONLINE
            nvme-eui.e8238fa6bf530001001b444a49a5fe67  ONLINE
          mirror-2                                     ONLINE
            nvme-WDS200T1X0E-00AFY0_21469J440105_1     ONLINE
            nvme-eui.e8238fa6bf530001001b444a49c68889  ONLINE

No hardware issues as far as I can tell, the read-only import shows each device and vdev with zero issues. This is the same result I get from zpool import -Fn. However, zpool import -f idunn will kernel panic both the original machine and the machine the pool is currently plugged into.

NicKF · November 5, 2023, 2:42am

try big hammer.

zpool import -F -f -R /mnt POOLNAME

zpool-import.8 — OpenZFS documentation

-f
Forces import, even if the pool appears to be potentially active.
-F
Recovery mode for a non-importable pool. Attempt to return the pool to an importable state by discarding the last few transactions. Not all damaged pools can be recovered by using this option. If successful, the data from the discarded transactions is irretrievably lost. This option is ignored if the pool is importable or already imported.
-R root
Sets the cachefile property to none and the altroot property to root.

Susanna · November 5, 2023, 2:45am

Already have, kernel panic.

Like I said I’m only able to import the pool readonly. zpool import -o readonly=on -f idunn will import it just fine.

  pool: idunn
 state: ONLINE
  scan: scrub repaired 0B in 03:40:11 with 0 errors on Fri Nov  3 13:08:23 2023
config:

        NAME                                           STATE     READ WRITE CKSUM
        idunn                                          ONLINE       0     0     0
          raidz1-0                                     ONLINE       0     0     0
            wwn-0x5000039af8d12fe3                     ONLINE       0     0     0
            wwn-0x5000039a58c98079                     ONLINE       0     0     0
            wwn-0x5000039a68cb99ac                     ONLINE       0     0     0
            wwn-0x5000039a58c98bab                     ONLINE       0     0     0
        special
          mirror-1                                     ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b444a48bfbe79  ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b444a49a5fe67  ONLINE       0     0     0
          mirror-2                                     ONLINE       0     0     0
            nvme-WDS200T1X0E-00AFY0_21469J440105_1     ONLINE       0     0     0
            nvme-eui.e8238fa6bf530001001b444a49c68889  ONLINE       0     0     0

NicKF · November 5, 2023, 2:46am

Have you tried the even bigger hammer? This may roll back more TXGs and will take long time.

zpool import -XF -f -R /mnt POOLNAME # "extreme" version

-X
Used with the -F recovery option. Determines whether extreme measures to find a valid txg should take place. This allows the pool to be rolled back to a txg which is no longer guaranteed to be consistent. Pools imported at an inconsistent txg may contain uncorrectable checksum errors. For more details about pool recovery mode, see the -F option, above. WARNING: This option can be extremely hazardous to the health of your pool and should only be used as a last resort.

Susanna · November 5, 2023, 2:52am

Used with the -F recovery option. Determines whether
extreme measures to find a valid txg should take place.
This allows the pool to be rolled back to a txg which
is no longer guaranteed to be consistent. Pools im-
ported at an inconsistent txg may contain uncorrectable
checksum errors. For more details about pool recovery
mode, see the -F option, above. WARNING: This option
can be extremely hazardous to the health of your pool
and should only be used as a last resort

No, I have not tried this, and I don’t think I want to. An inconsistent pool sounds worse to me than just recreating it entirely? I can import it read-only, send the most recent snapshots to my other pool, and then recreate this pool. It’ll take a few days, but wouldn’t the result be preferable?

EDIT: The difference between my last backup and the current state was tiny, so I tried this anyway. It’s not like it could make things worse.
But I just got another kernel panic.

NicKF · November 5, 2023, 3:04am

Not sure what else I can do to help. If you can mount as Read Only but not “forcibly” some metadata is corrupted most likely. I’d be more inclined to believe that there was some sort of wild power outage or some other random event in addition to non-ECC ram. But something “else” may be a problem from a hardware level. You haven’t included the specs so any and all of this info is guesswork.

-X will basically do a “database log replay”.

You can:

Try and import on another system
Copy the data over somewhere else while the pool is mounted as read only.
Start the ZFS send
Try and -fFX

Susanna · November 5, 2023, 3:09am

The RAIDZ drives are Samsung QVO 870s, the mirrors are Western Digital SN850s.
Yes, I’ve already imported it read only and sent the last snapshots to my backup drive. Any kind of read-write import seems to immediately cause a kernel panic, though, so I guess I’ll just have to destroy and recreate it. Ugh.

NicKF · November 5, 2023, 3:12am

IMO additional hardware level diagnostics should be added to your flow before kicking this thing back into prod.

If you sent corrupted snapshots over somewhere else, there’s no guarantees they will work given your scenario.

This may be worth reading
ZFS fundamentals: transaction groups | Delphix

Susanna · November 5, 2023, 3:14am

What diagnostics would you suggest? I’ve done smartctl self-tests, all the drives claim to be in good health.
Thanks for the link. I know how ZFS handles transactions, or at least the basics of it, but I’ll need lots to read while the spinning rust ticks away.

NicKF · November 5, 2023, 3:19am

Not sure whats going on. tar the contents of /var/log and SCP them down to your PC. I can look at it quick…but not sure it will help you much

No idea what specs you are running. Cascading failure from a problem with RAM, CPU, PSU and mobo can lead to unpredictable behavior and logs that “don’t always mean what you think they mean”. So can bad backplanes or cables…

Just as an example: If you’re using a crummy sata controller from China or something, weird things happen and you may not even have a good indicator of what went wrong.

This is always worth a whirl:
memtest

twin_savage · November 5, 2023, 3:55am

Could be the freespace map got borked, it would explain why read-only import is working without issue. Also the kernel panics.

Seems to have a fair possibility because the free space map gets appended at seemingly random times and if interrupted just right, it will corrupt itself.

If this is the case there isn’t anything you can do with ZFS’s built in tools to fix.

Susanna · November 5, 2023, 3:59am

Oh, that sucks.
Guess I’ll just recreate the pool. If it breaks again I’ll look into hardware issues, but I’ve never had any problems with these disks and I get identical crashes on both computers I’ve had the pool plugged into so…

diizzy · November 5, 2023, 10:02am

What you could try is to import it with FreeBSD to see if it’s more forgiving (shouldn’t make a difference though I guess).

Boot the mini memstick,
http://ftp.freebsd.org/pub/FreeBSD/releases/ISO-IMAGES/14.0/FreeBSD-14.0-RC4-amd64-mini-memstick.img.xz

Select “Live CD”

Login as root

Run kldload zfs (to load the zfs module)

And try importing it that way

Susanna · November 5, 2023, 1:08pm

Ran another set of long smartctl tests overnight, no errors. Tried to import it with freeBSD, but ran into the same problem there. Pool does seem to be dead. The experience has really soured me on ZFS though. send|receive means I’ll stick with it, but just losing an entire pool like this really sucks.

diizzy · November 5, 2023, 5:38pm

I suspected it would be similar given the shared codebase, since it’s readable it’s not completely lost at least. If you want to do some simple stresstesting, 7zip’s benchmark is easy to utilize and you can also adjust memory utilization without leaving the OS.

Susanna · November 14, 2023, 12:12am

Yaay, happened again.
I hate non-ECC memory.

Susanna · November 19, 2023, 9:42pm

Aaand it’s happened again.
I’m thinking it might be related to ARC write error makes pool unusable · Issue #15466 · openzfs/zfs · GitHub, so at least it’s being worked on. Still annoying though. After the last time I’ve taken to piping journalctl --follow over ssh to another machine, though, so for once I’ve actually got the error the PC makes when it breaks a pool, rather than just the one it makes when it tries to import the broken pool.

nov 19 22:26:11 idunn kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
nov 19 22:26:11 idunn kernel: CPU: 4 PID: 948 Comm: dp_sync_taskq Tainted: P        W  O       6.5.11 #1-NixOS
nov 19 22:26:11 idunn kernel: Hardware name: Gigabyte Technology Co., Ltd. B650I AORUS ULTRA/B650I AORUS ULTRA, BIOS F9 11/08/2023
nov 19 22:26:11 idunn kernel: RIP: 0010:arc_write+0x6c/0x490 [zfs]
nov 19 22:26:11 idunn kernel: Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
nov 19 22:26:11 idunn kernel: RSP: 0018:ffffa5df14257950 EFLAGS: 00010286
nov 19 22:26:11 idunn kernel: RAX: ffffa5df14257ad0 RBX: ffff994f0f167720 RCX: ffff995c8839b050
nov 19 22:26:11 idunn kernel: RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffa5df14257ab0
nov 19 22:26:11 idunn kernel: RBP: ffffa5df14257a20 R08: ffff994f0f167720 R09: 0000000000000000
nov 19 22:26:11 idunn kernel: R10: ffffa5df14257a30 R11: ffffffffc0410430 R12: 0000000000000000
nov 19 22:26:11 idunn kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
nov 19 22:26:11 idunn kernel: FS:  0000000000000000(0000) GS:ffff995cd8300000(0000) knlGS:0000000000000000
nov 19 22:26:11 idunn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov 19 22:26:11 idunn kernel: CR2: 0000000000000000 CR3: 0000000262420000 CR4: 0000000000750ee0
nov 19 22:26:11 idunn kernel: PKRU: 55555554
nov 19 22:26:11 idunn kernel: Call Trace:
nov 19 22:26:11 idunn kernel:  <TASK>
nov 19 22:26:11 idunn kernel:  ? __die+0x23/0x70
nov 19 22:26:11 idunn kernel:  ? page_fault_oops+0x17d/0x4b0
nov 19 22:26:11 idunn kernel:  ? exc_page_fault+0x6d/0x150
nov 19 22:26:11 idunn kernel:  ? asm_exc_page_fault+0x26/0x30
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? arc_write+0x6c/0x490 [zfs]
nov 19 22:26:11 idunn kernel:  ? taskq_init_ent+0x3c/0x80 [spl]
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? zio_write+0x5e/0x100 [zfs]
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_override_done+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_write+0x3d1/0x5d0 [zfs]
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? dbuf_hold_impl+0x112/0x760 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_leaf+0x139/0x700 [zfs]
nov 19 22:26:11 idunn kernel:  ? zio_nowait+0xc6/0x1c0 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_list+0xc3/0x120 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_indirect+0xe0/0x170 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_list+0x51/0x120 [zfs]
nov 19 22:26:11 idunn kernel:  dnode_sync+0x51c/0xb90 [zfs]
nov 19 22:26:11 idunn kernel:  sync_dnodes_task+0x75/0xb0 [zfs]
nov 19 22:26:11 idunn kernel:  taskq_thread+0x2be/0x4e0 [spl]
nov 19 22:26:11 idunn kernel:  ? __pfx_default_wake_function+0x10/0x10
nov 19 22:26:11 idunn kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
nov 19 22:26:11 idunn kernel:  kthread+0xe5/0x120
nov 19 22:26:11 idunn kernel:  ? __pfx_kthread+0x10/0x10
nov 19 22:26:11 idunn kernel:  ret_from_fork+0x31/0x50
nov 19 22:26:11 idunn kernel:  ? __pfx_kthread+0x10/0x10
nov 19 22:26:11 idunn kernel:  ret_from_fork_asm+0x1b/0x30
nov 19 22:26:11 idunn kernel:  </TASK>
nov 19 22:26:11 idunn kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat af_packet cmac algif_hash algif_skcipher af_alg bnep msr ext4 mbcache jbd2 amdgpu mt7921e mt7921_common mt76_connac_lib mt76 xt_conntrack nf_conntrack mac80211 nf_defrag_ipv6 nf_defrag_ipv4 snd_hda_codec_hdmi ip6t_rpfilter ipt_rpfilter snd_hda_intel snd_usb_audio snd_intel_dspcfg uvcvideo snd_intel_sdw_acpi edac_mce_amd btusb snd_hda_codec edac_core videobuf2_vmalloc btrtl uvc intel_rapl_msr videobuf2_memops btbcm snd_usbmidi_lib intel_rapl_common videobuf2_v4l2 btintel crc32_pclmul snd_rawmidi polyval_clmulni snd_hda_core btmtk polyval_generic videodev xt_pkttype gf128mul snd_seq_device snd_hwdep ghash_clmulni_intel cfg80211 xt_LOG sha512_ssse3 amdxcp bluetooth snd_pcm videobuf2_common nf_log_syslog sha512_generic iommu_v2 sp5100_tco igc ecdh_generic aesni_intel snd_timer mc xt_tcpudp crypto_simd gpu_sched rfkill watchdog input_leds snd ptp ecc cryptd mousedev joydev drm_suballoc_helper evdev crc16 libaes
nov 19 22:26:11 idunn kernel:  gigabyte_wmi wmi_bmof nft_compat led_class soundcore rapl k10temp i2c_piix4 drm_ttm_helper pps_core libarc4 mac_hid thermal tpm_crb nf_tables tpm_tis tiny_power_button tpm_tis_core libcrc32c crc32c_generic button nfnetlink sch_fq_codel uinput ctr wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel atkbd libps2 serio vivaldi_fmap loop cpufreq_ondemand tun tap macvlan bridge stp llc i2c_dev kvmgt mdev i915 drm_buddy ttm drm_display_helper cec drm_kms_helper intel_gtt agpgart i2c_algo_bit video wmi kvm_amd ccp kvm drm fuse deflate backlight efi_pstore configfs zstd zram zsmalloc efivarfs tpm rng_core dmi_sysfs ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj uas hid_generic usbhid hid usb_storage sd_mod ahci xhci_pci xhci_pci_renesas libahci firmware_class xhci_hcd libata nvme usbcore nvme_core scsi_mod t10_pi crc32c_intel crc64_rocksoft crc64 crc_t10dif usb_common scsi_common crct10dif_generic crct10dif_pclmul
nov 19 22:26:11 idunn kernel:  crct10dif_common rtc_cmos dm_mod dax zfs(PO) spl(O) vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
nov 19 22:26:11 idunn kernel: CR2: 0000000000000000
nov 19 22:26:11 idunn kernel: ---[ end trace 0000000000000000 ]---
nov 19 22:26:11 idunn kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
nov 19 22:26:11 idunn kernel: #PF: supervisor read access in kernel mode
nov 19 22:26:11 idunn kernel: #PF: error_code(0x0000) - not-present page
nov 19 22:26:11 idunn kernel: PGD 0 P4D 0
nov 19 22:26:11 idunn kernel: Oops: 0000 [#2] PREEMPT SMP NOPTI
nov 19 22:26:11 idunn kernel: CPU: 15 PID: 937 Comm: dp_sync_taskq Tainted: P      D W  O       6.5.11 #1-NixOS
nov 19 22:26:11 idunn kernel: Hardware name: Gigabyte Technology Co., Ltd. B650I AORUS ULTRA/B650I AORUS ULTRA, BIOS F9 11/08/2023
nov 19 22:26:11 idunn kernel: RIP: 0010:arc_write+0x6c/0x490 [zfs]
nov 19 22:26:11 idunn kernel: Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
nov 19 22:26:11 idunn kernel: RSP: 0018:ffffa5df141ff950 EFLAGS: 00010286
nov 19 22:26:11 idunn kernel: RAX: ffffa5df141ffad0 RBX: ffff994f0bf816f8 RCX: ffff994e42b53050
nov 19 22:26:11 idunn kernel: RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffa5df141ffab0
nov 19 22:26:11 idunn kernel: RBP: ffffa5df141ffa20 R08: ffff994f0bf816f8 R09: 0000000000000000
nov 19 22:26:11 idunn kernel: R10: ffffa5df141ffa30 R11: ffffffffc0410430 R12: 0000000000000000
nov 19 22:26:11 idunn kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
nov 19 22:26:11 idunn kernel: FS:  0000000000000000(0000) GS:ffff995cd85c0000(0000) knlGS:0000000000000000
nov 19 22:26:11 idunn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov 19 22:26:11 idunn kernel: CR2: 0000000000000000 CR3: 0000000262420000 CR4: 0000000000750ee0
nov 19 22:26:11 idunn kernel: PKRU: 55555554
nov 19 22:26:11 idunn kernel: Call Trace:
nov 19 22:26:11 idunn kernel:  <TASK>
nov 19 22:26:11 idunn kernel:  ? __die+0x23/0x70
nov 19 22:26:11 idunn kernel:  ? page_fault_oops+0x17d/0x4b0
nov 19 22:26:11 idunn kernel:  ? exc_page_fault+0x6d/0x150
nov 19 22:26:11 idunn kernel:  ? asm_exc_page_fault+0x26/0x30
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? arc_write+0x6c/0x490 [zfs]
nov 19 22:26:11 idunn kernel:  ? taskq_init_ent+0x3c/0x80 [spl]
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? zio_write+0x5e/0x100 [zfs]
nov 19 22:26:11 idunn kernel:  ? __slab_free+0xcf/0x300
nov 19 22:26:11 idunn kernel:  dbuf_write+0x3d1/0x5d0 [zfs]
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_ready+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? __pfx_dbuf_write_done+0x10/0x10 [zfs]
nov 19 22:26:11 idunn kernel:  ? dbuf_hold_impl+0x112/0x760 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_leaf+0x139/0x700 [zfs]
nov 19 22:26:11 idunn kernel:  ? zio_nowait+0xc6/0x1c0 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_list+0xc3/0x120 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_indirect+0xe0/0x170 [zfs]
nov 19 22:26:11 idunn kernel:  dbuf_sync_list+0x51/0x120 [zfs]
nov 19 22:26:11 idunn kernel:  dnode_sync+0x51c/0xb90 [zfs]
nov 19 22:26:11 idunn kernel:  sync_dnodes_task+0x75/0xb0 [zfs]
nov 19 22:26:11 idunn kernel:  taskq_thread+0x2be/0x4e0 [spl]
nov 19 22:26:11 idunn kernel:  ? __pfx_default_wake_function+0x10/0x10
nov 19 22:26:11 idunn kernel:  ? __pfx_taskq_thread+0x10/0x10 [spl]
nov 19 22:26:11 idunn kernel:  kthread+0xe5/0x120
nov 19 22:26:11 idunn kernel:  ? __pfx_kthread+0x10/0x10
nov 19 22:26:11 idunn kernel:  ret_from_fork+0x31/0x50
nov 19 22:26:11 idunn kernel:  ? __pfx_kthread+0x10/0x10
nov 19 22:26:11 idunn kernel:  ret_from_fork_asm+0x1b/0x30
nov 19 22:26:11 idunn kernel:  </TASK>
nov 19 22:26:11 idunn kernel: Modules linked in: xt_CHECKSUM xt_MASQUERADE ipt_REJECT nf_reject_ipv4 nft_chain_nat nf_nat af_packet cmac algif_hash algif_skcipher af_alg bnep msr ext4 mbcache jbd2 amdgpu mt7921e mt7921_common mt76_connac_lib mt76 xt_conntrack nf_conntrack mac80211 nf_defrag_ipv6 nf_defrag_ipv4 snd_hda_codec_hdmi ip6t_rpfilter ipt_rpfilter snd_hda_intel snd_usb_audio snd_intel_dspcfg uvcvideo snd_intel_sdw_acpi edac_mce_amd btusb snd_hda_codec edac_core videobuf2_vmalloc btrtl uvc intel_rapl_msr videobuf2_memops btbcm snd_usbmidi_lib intel_rapl_common videobuf2_v4l2 btintel crc32_pclmul snd_rawmidi polyval_clmulni snd_hda_core btmtk polyval_generic videodev xt_pkttype gf128mul snd_seq_device snd_hwdep ghash_clmulni_intel cfg80211 xt_LOG sha512_ssse3 amdxcp bluetooth snd_pcm videobuf2_common nf_log_syslog sha512_generic iommu_v2 sp5100_tco igc ecdh_generic aesni_intel snd_timer mc xt_tcpudp crypto_simd gpu_sched rfkill watchdog input_leds snd ptp ecc cryptd mousedev joydev drm_suballoc_helper evdev crc16 libaes
nov 19 22:26:11 idunn kernel:  gigabyte_wmi wmi_bmof nft_compat led_class soundcore rapl k10temp i2c_piix4 drm_ttm_helper pps_core libarc4 mac_hid thermal tpm_crb nf_tables tpm_tis tiny_power_button tpm_tis_core libcrc32c crc32c_generic button nfnetlink sch_fq_codel uinput ctr wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel atkbd libps2 serio vivaldi_fmap loop cpufreq_ondemand tun tap macvlan bridge stp llc i2c_dev kvmgt mdev i915 drm_buddy ttm drm_display_helper cec drm_kms_helper intel_gtt agpgart i2c_algo_bit video wmi kvm_amd ccp kvm drm fuse deflate backlight efi_pstore configfs zstd zram zsmalloc efivarfs tpm rng_core dmi_sysfs ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj uas hid_generic usbhid hid usb_storage sd_mod ahci xhci_pci xhci_pci_renesas libahci firmware_class xhci_hcd libata nvme usbcore nvme_core scsi_mod t10_pi crc32c_intel crc64_rocksoft crc64 crc_t10dif usb_common scsi_common crct10dif_generic crct10dif_pclmul
nov 19 22:26:11 idunn kernel:  crct10dif_common rtc_cmos dm_mod dax zfs(PO) spl(O) vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd
nov 19 22:26:11 idunn kernel: CR2: 0000000000000000
nov 19 22:26:11 idunn kernel: ---[ end trace 0000000000000000 ]---
nov 19 22:26:11 idunn kernel: RIP: 0010:arc_write+0x6c/0x490 [zfs]
nov 19 22:26:11 idunn kernel: Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
nov 19 22:26:11 idunn kernel: RSP: 0018:ffffa5df14257950 EFLAGS: 00010286
nov 19 22:26:11 idunn kernel: RAX: ffffa5df14257ad0 RBX: ffff994f0f167720 RCX: ffff995c8839b050
nov 19 22:26:11 idunn kernel: RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffa5df14257ab0
nov 19 22:26:11 idunn kernel: RBP: ffffa5df14257a20 R08: ffff994f0f167720 R09: 0000000000000000
nov 19 22:26:11 idunn kernel: R10: ffffa5df14257a30 R11: ffffffffc0410430 R12: 0000000000000000
nov 19 22:26:11 idunn kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
nov 19 22:26:11 idunn kernel: FS:  0000000000000000(0000) GS:ffff995cd8300000(0000) knlGS:0000000000000000
nov 19 22:26:11 idunn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov 19 22:26:11 idunn kernel: CR2: 0000000000000000 CR3: 000000045ef8e000 CR4: 0000000000750ee0
nov 19 22:26:11 idunn kernel: PKRU: 55555554
nov 19 22:26:11 idunn kernel: note: dp_sync_taskq[948] exited with irqs disabled
nov 19 22:26:11 idunn kernel: RIP: 0010:arc_write+0x6c/0x490 [zfs]
nov 19 22:26:11 idunn kernel: Code: 7a 40 48 89 b5 50 ff ff ff 41 8b 72 30 4d 8b 5a 20 48 89 95 60 ff ff ff 4d 8b 42 28 41 8b 12 48 89 8d 58 ff ff ff 45 8b 72 38 <49> 8b 1c 24 89 b5 4c ff ff ff 48 89 bd 40 ff ff ff 65 48 8b 0c 25
nov 19 22:26:11 idunn kernel: RSP: 0018:ffffa5df14257950 EFLAGS: 00010286
nov 19 22:26:11 idunn kernel: RAX: ffffa5df14257ad0 RBX: ffff994f0f167720 RCX: ffff995c8839b050
nov 19 22:26:11 idunn kernel: RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffffa5df14257ab0
nov 19 22:26:11 idunn kernel: RBP: ffffa5df14257a20 R08: ffff994f0f167720 R09: 0000000000000000
nov 19 22:26:11 idunn kernel: R10: ffffa5df14257a30 R11: ffffffffc0410430 R12: 0000000000000000
nov 19 22:26:11 idunn kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
nov 19 22:26:11 idunn kernel: FS:  0000000000000000(0000) GS:ffff995cd85c0000(0000) knlGS:0000000000000000
nov 19 22:26:11 idunn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
nov 19 22:26:11 idunn kernel: CR2: 0000000000000000 CR3: 000000045ef8e000 CR4: 0000000000750ee0
nov 19 22:26:11 idunn kernel: PKRU: 55555554

I guess don’t update your ZFS if you want things to work. Ugh! Linux world needs to stop copying web devs “let’s make things worse” update policy, rebuilding my pools all the time is really annoying.

diizzy · November 19, 2023, 10:39pm

Is block cloning enabled? Does this occur if you don’t overclock RAM etc?
Are this mirror on NVME drives or SATA?