Truenas Lockup/Crashing

So, before I get into the issue I am having, here are the specs of my system:

Motherboard: Supermicro x11ssh
CPU: Intel E3-1245v5 4c/8t
Memory: 64gig Micron DDR4 2400 ECC
Hard Drives:
2-Western Digital Red 8tb 5460rpm 128meg cache CMR – Raid 1 mirror
2-Samsung 870 Evo 500gig 2.5” SSD Raid 1 mirror for Apps
2-PNY Optima 120gig mirrored boot drives.

Network Cards: Onboard Intel i210 1gb ethernet controller
Power supply is 600 watt 80+ Gold

Network Environment:

PFsense Appliance running Intel X550-t2 negotiating at 5g ISP network connection dropping to 1gig lan. Then feeding 1-8port Gigabit Netgear Managed switch, which then splits to 2 more Gigabit Netgear Managed switch’s. All the switches are the same, allow for flow control, and 2 port binding. Various devices on the network from game systems, to workstations, to streaming appliances. No packet issues, collisions, dropped connections. No cabling issues. For all intensive purposes, a clean environment. No prior network issues, comm issues etc.

Onto the problem:

Supermicro has the current BIOS/BMC-IPMI firmware installed.
Truenas Scale is installed on the 120gig mirrored SSDs, with 16gig swap file, in UEFI mode. Boots normally and allows clean access with no problems.
Mounted the 2 Samsung SSDs and created a Dataset to store App data, configurations, backups Etc.
Mounted the 2 WD Red drives in an initial Mirror with multiple main Datasets, I.E. Documents, Music, Videos, Programs etc.
Whenever I try to copy data (Could be as little as 300mb, or as large as 30gigs) from either my workstation or VM server, Truenas locks up to the point I have to initiate a server reset through IPMI.
The transfers will start out strong, and gradually fall off hitting 0mb transfer, causing windows to not be able to find to folder share, at that point browser shows searching for Truenas, and IPMI shows nothing out of the ordinary.
IPMI doesn’t show any alerts, systems issues, temp issues, cpu, nothing.
Ran memtest, memory checks fine. It’s running at 2133mhz which shouldn’t be a problem at all.

Steps I have used to try and solve the issue are as follows:

Swapped out the 64gig Micron for a 16gig Samsung stick. – Same issues.
Hooked up both 1gig Nic ports – Same issue.
Tried Network Flow control – Same issue.
Haven’t tried port binding (LAG), however I don’t see it being a speed issue.
Tried moving Truenas server to the same switch as either the workstation or VM server. – Same Issue

What I am missing?

Is this just the WebUI or is a SSH connection also not possible?

TrueNAS tends to get very laggy if CPU load is very high. I usually only encounter this with very beefy compression. Can be caused by other things. I’d ssh into the machine and send some data, watch htop and maybe grab some logs.

Could also be a TrueNAS Scale bug. I had iSCSI connections locking up beyond repair on a daily basis. Scale is very much beta software. I’d check if another OS behaves similar. TrueNAS Core is solid, but Proxmox also has ZFS to import the pool(s). Or some Live environment to get SMB/NFS running. Spinning up a VM still uses the Scale network stack, so might not be that good for testing.

Yeah, nothing is possible on the Truenas side, no Web, no ssh, or even cli at the machine. Its like a WINDOWS BSOD but in Truenas.

I may try running core to see if I get the same issues.

I thought about disabling cpu cores/threads, trying a different set of boot drives.

I choose to go Scale as it has a better list and management of Apps.

Not sure if the current ZFS bug is a reason as well, but from what I have seen, there is not another issue listed like this. a complete lock up on a Linux based O/S? Interesting to say the least.

Core is what made the name TrueNAS become popular. It’s great. But GlusterFS clustering with K3s and Docker just isn’t possible. But Core has it’s own app catalog (can’t compete with TrueCharts) and you can run VMs to get your containerized apps. I just use Core for storage so my experience with that is limited.

It’s not so much about a bug, things happen…but releasing an untested new version with all kinds of problems, not only blockcloning. We had a thread here with someone ending up with a grub rescue prompt after reboot. I can accept these things on Arch Linux, but not on a stable storage server appliance. Betatesting for Enterprise customers is one thing, wrecking users Install and pools is a disgrace.

Core is much more stable in that regard. It’s mostly for enterprise customers at this point. UI is less fancy and Containers are of limited use (FreeBSD jails). But a Debian VM can do all kinds of things, incl. Portainer, Docker, Kubernetes, whatever.

The hardware itself (controller, NICs etc) are all solid choices so I don’t see that being an issue. TrueNAS primarily targets “NAS” duties, while it supports other applications and features consider that as ~you’re on your own thing and jails can be very useful if there’s even such a need for such functionality to begin with. I would however advice you to look at another solution if your goal is not to run a NAS but to “all-in-one box” that also acts as a file server.

If you have compression enabled especially gzip at higher levels it can take quite a while before the queue clear up and during that time ZFS can take up much of the CPUs time, your box should however at least reply to ping during that time while Samba and friends can be unresponsive.

Do you see the same issue if you local try to write lets say a 5G large file using dd (perferably using /dev/(u)random so it stresses the system a bit) locally or the affected array(s)?

ZFS will totally take all cores to compress asap. He was talking about 300MB files…even with GZIP (horribly slow, but good for heating the room in winter) shouldn’t take that long. And when I had TrueNAS with hiccups, laggy UI and real delays on SSH because of extreme load, it was usually temporary and resolved once the TXG was fully committed (and compressed).

I think, without knowing any more details about datasets and CPU load, this could still be the case. But I also know Scale doing wierd things.

:+1: This was my approach too when I was using Scale. Consolidating multiple things into one. Master of all trades, it’ll be glorious! (it wasn’t).
I went back to hypervisor+ VMs specialized in their jobs.

FreeBSD and VMs (if needed) here, no need to make it overly complicated and/or add overhead just because :wink:

Has worked wonders for years but I do have a few boxes running TrueNAS Core that simply acts as backups and/or file servers.

Update:

As it stands I installed Truenas Core and setup a basic dataset/test folder, then did a direct connect between my workstation station and NAS transferring 300+gigs of information at a constant 900MB/sec rate with no drop.

This points more towards Scale being an issue then Core.

Ex, while I agree with you the idea of separate tasks for separate purposes, I too run a Type 1 Hyper V. I do like the idea of the App based system in Scale which allows for direct access to data.

All in all, I think I may have come to an idea what the problem is or was. I’ll have a better idea when I reinstall Scale.

Final update:

Well everything worked and checked out with using Truenas Core, both connecting directly to my workstation, and reconnecting through my LAN.

There is something in Truenas Scale that is causing it to crash on file transfers.

Both test environments started as base installs, created datasets, shares and just basic SMB service running.

I don’t know for sure what it is, but my guess is the way Debian is handling the zfs plugin.

Eitherway, guess its Core for the time being.

That’s a very basic setup. And I was using Proxmox (Debian) with an Ubuntu Server for SMB last year…I didn’t encounter any wierd behavior.

I had some “disagreement” with Scale and ZFS. Scale customizes a lot around that and zed (ZFS event daemon) wasn’t doing its job. zed was doing well on Proxmox when I checked for comparison, both use Debian. So I attributed it to iX Systems because that was the only difference I could see.

Core just works. And if you later decide to migrate, ZFS is mobile and you can import to pool everywhere you want. Happy ending? TBD. But I’m glad you got your stuff running now.