Bootloop fault code page not present

A few days ago I got an alert that the system had an unexpected restart but nothing was wrong so I just kind of ignored it. The system is behind a UPS so the random restart wasn’t due to power loss. The next day I realized it was stuck in a boot loop with a message before restarting

fault code = supervisor read data, page not present
processor eflags = interrupt enabled, resume, IOPL = 8
curret process = 5 (txg_thread_enter)
trap number = 12
panic: page fault
cpuid = 8

The system would no longer boot so I tried installing TrueNAS Core on a new partition as an upgrade, but the same issue happened. I then tried doing a clean install and while I could get to the GUI as soon as I tried to import the pools the system would restart and go back to the same error. I tried following a few different solutions found on the forum like importing as readonly but it did the same thing. After reinstalling TrueNAS when I try to import the pool it says poolname not found, and when I try to list the pools it only shows boot-pool, unless its a clean install I can see the pool through the GUI. So maybe I’m just using the wrongs commands since I’m not very familiar with this.

I’ve had the system running for over a year without any problems so I’m not sure what happened since I didn’t do any updates or anything like that when it started happening. Below are my system specs. Its been a while since set it up but I’m pretty sure its setup as RAIDZ2, but could be RAIDZ1.

Does anyone have any ideas how to fix this?

What I tried :
Ran S.M.A.R.T. test on all drives and it passed without any errors
Ran MemTest and it passed without errors
Upgrading TrueNAS Core from previous installation
Installing TrueNAS Core from scratch
Importing pools as readonly

System Specs
Motherboard : MSI PRO B660M-A Micro-ATX
RAM : 2x 16GB DDR4 3600 CORSAIR Vengeance
CPU : Intel Core i5-12600K
Storage : 8x Seagate 10TB IronWolf SATA III, 1x 500GB SSD (for running TrueNAS)
Expansion cards : 1x 10GB PCI-E Network Card (onboard had issues), 1x M.2 PCIe 6 x SATA Adapter Card (motherboard doesn’t have enough SATA ports)

1x M.2 PCIe 6 x SATA Adapter Card

Verify it actually works/is still working, seems like common link between memory corruption and intermittent disk availability.

  • Are physical drives visible in truenas (not pool)
  • does lspci show you whatever controller is on that card?
  • is it overheating, is it seated properly?

Supervisor read data, page not present is usually sign of bad memory, but the thing can be caused by misbehaving controller that corrupts working memory as consequence. Garbage in, garbage out.

I messed around with the NAS today and unplugged 4 of the drives and tried to boot the system which worked because it couldn’t import the pool. Once I plugged them back in it began to work again with all drives showing up. The pool showed “Online (Unhealthy)” so I ran zpool status and 2 of the drives had a 1 under the UM column. Not sure if that’s supposed to be CheckSum or not but it was just the two letters.

After clearing the errors it is showing everything is fine now so I’m wondering if it has to do with the SATA adapter, or could it be something with the drives themselves? I don’t think they’re overheating as there’s pretty good air flow on the whole system.

When I ran Ispci there are two SATA controllers. One is ASMedia Technology Inc., and the other SATA controller is Intel. I’m going to guess ASMedia is the adapter card.