TrueNAS Scale / Can't get anything to work

I’ve got myself a new HW and decided the TrueNAS Scale might be mature enough after receiving praise all over YouTube. After all, their version is marked as release meaning it’s fit for general use, but not fit for business use: /redacted/

So anyway I was trying to set up 2 things - SMB shares and run a Plex VM with PCI passthrough for RTX2060. Current state of things is that TrueNAS Scale doesn’t boot, SMB shares sort of work but nobody understands how, and the VMs are broken and don’t even post. Before I give up, I wanted to ask if maybe I’m doing something wrong.

The build:

  • SuperMicro X11SPM-TF
  • Xeon 4208
  • 6x 16GB DDR4 ECC, memtest ran 12hours OK
  • Gigabyte RTX2060 6G
  • 9x 10TB HPE MB010000GWAYN
  • 2x 4TB Samsung SSDs (this is where the VM lives)
  • 1x SuperMicro 32G SATA DOM (this is where TrueNAS lives)

I can go more into detail on the HW if anyone feels it’s related to the trouble I’m having.

#1 - I can’t get it to boot. It gets stuck on “loading initial ramdisk” and it takes between 30 minutes and 10 hours to boot (don’t know how long exactly, I left it overnight and it was booted up in the morning. I tried restarting it again today after it has been turned off for 12 hours and the same thing is happening). I tried following the tutorial on disabling the serial console, but it doesn’t do anything. The issue persists and is frankly a deal breaker on it’s own.

The thread for it is here: /redacted/

#2 - I can’t get the VM to launch. So I have encrypted dataset that is locked after boot. I don’t know if this is the reason on why the VM doesn’t work. So after I created the VM with the OS disk pointing to the locked dataset, the UI hangs, keeps spinning the activity indicator endlessly and after F5 refresh the VM is created, but it cannot be launched, edited, or deleted. The problem was that the dataset it was stored in is locked and TrueNAS doesn’t know how to handle this. I tried deleting the VM like 10 times, but it silently fails, so I rebooted and the VM was gone. So I recreated it on unlocked dataset and it ran, installed Windows, nVidia drivers, Plex, all seems to work. I turn everything off and come to it next day. TrueNAS boots and attempts to auto-launch the VM, but because it’s on a locked dataset (again), the launch silently fails. I go in, unlock the dataset, now the VM starts but doesn’t post, the VNC screen stays blank, RDP never works and router shows no IP lease for the VM. I probably can delete it and redo it again, but I would rather figure out why this is happening. Relevant threads:

  • /redacted/
  • /redacted/

Those two issues are real deal breakers for me, I need this rather basic stuff to work and it just doesn’t. There’s more problems that I’m trying to get help with on iX forums (like, bonding NICs doesn’t really work and the SMB shares are somewhat implemented but the number of open topics on it is a clear indication that the way they are implemented is so confusing people are having hard time to set it up).

I’m trying to figure out whether Scale works for me asap, I thought a week in I’ll already be set up, but I keep hitting problems I didn’t expect in a production release.

I tried including links but I am not allowed to, so I removed them. Also this will be my last reaction for a while because “I keep doing this too quickly, wait.”

Not a SCALE user but does any operating system boot slowly or is it just slow using the SATA DOM?

No, it was booting fine for all of about 6 days. Out of nowhere it started doing this 2 days ago. Between restarts I can think of doing 2 things - create a VM with GPU passthrough and set static IP on my router.

I doubt it’s ‘slow’ more like hung up on one step. Yes it could be the VM you created so turn that off.

So I tried changing the boot USB stick in case it was broken, but that didn’t work. So I tried creating boot sticks for TrueNAS Core instead and also Windows Server and they all hung during boot. Windows hang during “mounting NTFS service” which convinced me the issue must be with the drive. So I unplugged the SuperMicro SATA DOM and reinstalled the OS on a regular NVME disk.

I don’t know how to diagnose a SATA DOM drive, especially when any OS installation or OS boot fails when it’s plugged in. Maybe there’s something like memtest for drives? The module in question is Supermicro SATA DOM (SuperDOM) Solutions 32GB - DM032-SMCMVN1

OK so the issue is back and I have a clue on what’s causing it. Somehow changing NIC settings in TrueNAS Scale corrupts it’s own installation and it won’t boot from that point. I remember that before this happened the first time, I was assigning reserved IP addresses for both NICs in the system and after reboot, TrueNAS wouldn’t boot with message “loading initial ramdisk”. I then went down the rabbit hole to convince myself the SATA DOM module must’ve broken itself, because not even Windows Server installer could work, it too hang during boot. This time around, I was setting up bridge in TrueNAS and rebooted it to force it to lease the new IP address that I’ve reserved in router/DHCP and from that point it wouldn’t boot.

So TrueNAS literally corrupts itself by me doing some UI changes in it’s settings related to networking to the point where the error is so low level it cripples everything including OS installers trying to mount the disk. I’m really frustrated about this, my hardware was rather expensive and I don’t think I’ve done anything wrong here. The TrueNAS forums are rather unhelpful, because they are swarmed by users reporting other critical issues, so I don’t think I can get them to look at my problem specifically.

So here is a screen I get to when booting TrueNAS Scale install:

And here is a screen I get to when trying to boot Kali linux:

I’m at a loss here. Somehow TrueNAS, maybe not even through fault of their own is eating up my drives in a way that prevents any OS to even boot to run a diagnostic tool to verify what is wrong with it. I’m currently trying to load Kali and if it behaves anything like it did the first time around, at some point between 30 minutes and 10 hours, it will eventually boot. At that point I should be able to reformat those affected drives.