Gula: NAS+ build, the quest for the right OS

Goals

I currently have a server with a 5 disk raid6 array (mdadm+lvm2), however I would like to have a setup that also protects against data corruption.
Given how docker likes to muck around with firewall settings it would also be nice to be able to run any containers that shouldn’t be exposed to the internet on this machine, rather than on the current one which also server as firewall/router.

In summary:

  • NAS
  • light container use
  • reliable
  • should just work™ (if I need to fiddle a lot to get/keep basic things going I might as well just install Gentoo, which I’m familiar with, and be done with it)
  • protection against file corruption (iow. ZFS or BTRFS)
  • RAID6 (or similar), which basically leaves ZFS.

Hardware

Name Gula (Why the name?)
Case 3U enclosure with 16 SATA2 hot-swap bays
Motherboard Supermicro X7DBE (and a SIMLP-B IPMI card)
CPU 2 x Intel Xeon E5335 2.0 Ghz
RAM 32GB Fully Buffered ECC DDR2
RAID Controller/HBA 3Ware 9650SE-16ML (replaced with 2 Dell PERC H310s)
System drive Kingston A400 120GB SSD
HDDs (currently) 3 x 1TB drives for testing purposes (+1 1TB broken drive for error handling testing)

Two of the drives have SMART warnings, one is just plain “toast” and the last one is just a normal functional drive (the weirdo :wink: ).

Preparation

Since this is an old server it didn’t need assembly, however it did come with some issues which broadly fell into two categories.

Issues I suspected to be BIOS related:

  • inability to boot from USB devices (even though the Phoenix BIOS detected them)
  • inability to set up networking for the IPMI card
  • inability to find and boot from the Kingston SSD connected to a motherboard SATA port

And slightly more hardware related issues, namely fans:

  • noise
  • fan on second PSU turned out to be broken

Solving BIOS related issues

Finding and upgrading BIOS

Judging by the version and date on the BIOS the board might still have been on the release BIOS: 2.0b dated 2008-03-20. Unfortunately BIOS updates were no longer available from Supermicro’s site and I had to go hunting around the internet, eventually landing on a Supermicro supplier’s support page where I found a 2.1a version of the BIOS, not the latest (2.1c), but much more recent than what I had.

Flashing was a bit of a pain since these old BIOS’s assume floppies and all that rot. I ended up formatting an IDE drive as FAT32 and stuffing the BIOS tools on there and creating a FreeDOS boot CD to flash that way. One issue I ran into was that one of the EXEs required by the flash script was missing, so I had to grab that from another BIOS firmware zip (from the X7DB8, which, as I understand, is basically the same board but with SCSI). Once I had that I was able to succesfully flash the new BIOS. Yay!

Sorting out IPMI

This gave me new IPMI options in the BIOS now allowing me to set up the networking on the SIMLP-B card. Yay!
Next up was trying to log in, there appeared to still be some configuraiton on there from a previous owner (4 users all starting with cern: cernanon, cernoracle, …? Wait, what?). Thankfully the “administrator” user just used the default Supermicro password of “ADMIN”.

Solving the SSD boot device issue

This board has 3 SATA controllers, the “normal” one provided by the chipset, which appeared unable to detect the SSD for some reason (SATA HDD on same port worked fine), and then two RAID controllers, an Intel and an Adaptec one. Switching to the Adaptec controller and setting that to passthrough resulted in the SSD finally being recognized correctly.

USB boot devices

The USB booting still doesn’t work. I’ve just resorted to attaching an IDE DVD drive (hence the IDE cable in the first picture) instead of wasting more time on getting that sorted.

The actual issue is that the Phoenix BIOS doesn’t allow me to move the USB devices up in the boot order (it just beeps at me), moving other devices (eg. floppies) works fine though.

Fan fun!

The machine turned out even louder than I expected (I’m not unfamiliar with server noise), I measured a constant noise of over 74dBA with “professional equipment” (aka, my phone :wink: ), after the fans had spun down. So yeah, loud was expected, but this loud, well, that would have to be take care of.

I then measured all fans in isolation to judge which ones were the loudest and most ripe for replacement.
Measurement were taken as follows:

  • phone was put 20cm away from fan
  • fans were moved out of the case into a quieter case to avoid getting drowned out by the PSU fans
  • PSU fan was measure by unplugging all other fans and PSU’s (the PSU consists of three reduntant units)
  • I didn’t measure the CPU fans (maybe I should try doing that)

The chassis contains 10 fans, of 5 different types (if people are interested in the exact models I can add those):

Location Amount Size (mm) Noise (dBA) Replacement
PSU 3 40 59 Everflow R124028BL
Exhaust 2 60 65 Noctua NF-A6-25 PWM
Mid 40 3 40 61 Supermicro FAN-0100L4
Mid 80 3 80 61 SilverStone SST-FM84

Requirements were that they’d still need to actually move some air, while all of them are obviously weaker than the stock fans I didn’t just want to replace everything with Noctuas and lose all airflow, instead I tried to stick to “reasonable” downgrades based on the specification of the original fans (though I couldn’t find specs for the 80mm fans). The PSU fan replacements were picked based on a few Reddit posts of someone who claimed success with these specific fans in a similar Supermicro PSU.

The replacement I’m the least happy with would be the Noctuas for the 60mm fan, they are a lot weaker than the fans that were in there but I just wasn’t able to find anything closer to the stock fans in that size.

So far I’ve installed all but the SilverStone fans which has brought noise down to a more reasonable 50-ish dBA.

Temperatures have also been more than fine (CPU’s sitting around 24-27degrees Celsius and case around 34, peaking at 37 during ZFS module compile), but we’ll see how that evolves come summer and when I fill more bays with hard drives.

FreeNAS

Installation was smooth as butter, with one caveat: since it wouldn’t find the SSD I performed the install on another machine, and then moved the SSD to the server (and then solved the booting from SSD issue). There are known issues with some old Supermicro boards that result in the installer just outright crashing, or hanging, which mine did when I initially tried to install to an IDE HDD before the SSD arrived in the mail.

I then set up the four spinning disks disks as JBOD in the 3Ware controller and then created a zpool out of them (note that I was aware that hardware raid controllers, including this one specifically, are discouraged for use with FreeNAS/ZFS, just wanted to see what would happen). Since, as mentioned, one of the drives is actually toast the 3Ware controller interfered and just kept spamming errors during FreeNAS boot, never allowing the OS to come up, eventually just rebooting instead to start the entire ordeal over again.

To avoid a bootloop I just shut down the system and replaced the 3Ware with 2 Dell PERC H310 HBAs. The H310s BIOS allowed me to switch between both cards, addressing a concern I’d had as some people reported that certain motherboards didn’t allow chain loading HBAs (so only the first one would “boot”). I removed all RAID configuration so the disks would just be passed to the OS (also tossed the 4th, broken disk, for now)

Not much else to say, rather smooth sailing from then on out. Unfortunately FreeNAS appeared unable to access the drives’ SMART status and didn’t have support for hardware sensors (a patch was apparently proposed for FreeBSD, but rejected, a few years ago, so doesn’t appear to be a big priority for upstream).

Good:

  • quick and easy install
  • snappy UI
  • everything “just works™” after installation, no surprises
  • ZFS integration is excellent

Bad:

  • installer can just hang on some older Supermicro boards requiring installation through another machine
  • appears unable to read SMART data without flashing controller (even when in passthrough mode)
  • no hardware sensors (fan speeds, case temperature, …)
  • setting up services not intuitive (no big deal, nothing some rtfm can’t fix)

OpenMediaVault

First installation attempt I didn’t remove drives (aside form boot disk), despite that being strongly recommended. Installation proceeds smoothly (albeit extremely slowly). After the post-install reboot, I fail to login until I search some and discover that you can’t login as root in the web UI but have to use “admin” with default password “openmediavault”. I also discover that the UI doesn’t work at all without JavaScript (I run NoScript). Promising.
I then notice oddities with drive lettering (boot drive is /dev/sdd), so I decide to just start over (not a complaint, just a warning to those who might want to ignore the manual as I did :wink: )

This time around I do remove the HDDs and leave only the boot disk connected. Install crawls (seriously, why so slow?) to a close and I reboot, only to be thrown into an initramfs prompt. Turns out it did try to boot from /dev/sda this time around, except that adding the other disks back to the system resulted in the boot device becoming /dev/sdd again. Whoops.

So I remove the disks, reboot, succesfully this time since sdd is now sda again. Fiddle with Grub some to try and see what is going on, find a bug report that appears relevant and decide to upgrade OMV from 5.0.5 (latest iso at time of writing) to 5.1.1-1 with the updater, hoping that newer versions include the fix from upstream.

Running grub-install /dev/sda now appears to give me a functional Grub config using UUIDs. Victory!

During the installer I’d just let the system set up DHCP on the first NIC. So I figure it’s time to change that, especially since OMV somehow manages to get a different IP address from DHCP each time. To make sure I wouldn’t have to bother with keyboards and monitors in case I do mess something up I decide to set up the second NIC to use DHCP as a backup, there’s no cable attached, but hey, what’s the worst that could happen? Famous last words, as it turns out.

So I let OMV apply the changes…aaaaand “An error has occured”. This completely breaks the web UI, and, after a glance at ntopng, appears to have outright killed the networking. Welp, so much for reliability.

System ignores my attempts at a “Graceful Shutdown” over IPMI, so “Reset” it is.

Comes back up, and thankfully managed to store the networking configuration, so onwards to setting up a static IP for NIC0. This time I get the error again, but likely because I changed the IP, networking appears to still be up. We’re still in business, phew.

ZFS

First install OMV Extras plugin as per the instructions.
Then I installed the ZFS plugin, which appeared to install zfs tooling before compiling the kernel module, which resulted in a failure. Despite this the module was marked as installed, so I had to uninstall it and try again, this time it appeared to have worked (zfs menu item appeared).

Importing the pool previously created on FreeNAS worked just fine once I added the force option.

Good:

  • SMART data on PERC controller without having to flash
  • ZFS support

Bad:

  • extremely slow install
  • latest ISO can result in broken bootloader configuration
  • appears sluggish in general compared to FreeNAS on the same hardware
  • networking setup seems a bit wonky
  • terminal output unusably small on CRTs (no big deal as you normally don’t use it)
  • ZFS support not part of core, concerns about surprises on upgrades.
  • no support for hardware sensors in the web UI

Next?

So far I’ve mostly touched on the installation and initial configuration, of course those are rather important initial impressions and might result in the decision to just not put any more time into a system.

Given the issues I’ve experienced with OMV so far I doubt I’ll be sticking with it, at least for this particular purpose. Maybe on more reasonable hardware (or unreasonable, I might have another system coming in that OMV might be a good fit for, if only because most other options don’t even support running on it)

So options for the future:

  • try XigmaNAS, but since it’s also based on FreeBSD that might run into much the same issues as FreeNAS?
  • try FreeNAS again, I found some information that using a different driver for the HBA card might give me access to SMART data
  • try Proxmox, I think ZFS is available by default, so that might be a viable option. Would be interesting to see how many of the OMV quirks are due to Debian
  • try some more stuff (eg. virtualisation, containers) and see how that’s handled.

Suggestions most definitely welcome, do note that I much prefer free (as in freedom) options, so Unraid would be a “last resort” kind of deal, but I’m aware of it (I saw the GN build :wink: )

3 Likes

Reinstalled FreeNAS, installer crashed again so I had to take out the SSD and use another system to perform the install. Guess this confirms this board does suffer from the “FreeNAS installer crashing on old Supermicro boards”-issue.

Once back up and running I tried to get SMART information working without flashing the PERC H310 since this controller actually does actual JBOD, so the only reason to flash it would be because of driver support (the fact that OpenMediaVault had no issues accessing the disks directly, or displaying SMART information does appear to confirm this)

The default mfi driver still tries to access the drives through the controller (or pretends it does, not sure how it’s implemented), giving them names like mfisyspd0 this also means SMART doesn’t really work, however it also makes a bunch of “pass” devices (one per disk), which are hardware passthrough devices that you can actually query for SMART data, proving that the OS can actually get direct access to the drives…

Based on information I found on the FreeBSD forums I tried to get FreeNAS to use the mrsas driver, however the LSI 2008 chipset is not supported by that driver so the FreeBSD underlying FreeNAS did the smart thing and just refused to switch drivers.

However, the other option provided, setting hw.mfi.allow_cam_disk_passthrough in loader.conf did appear to work (which can, and should, be edited through the web UI under System -> Tunables, else it will just get overwritten on next boot).

After a reboot these disks will then be axposed directly to the OS:


Unfortunately the pseudo raid devices still show up as well, but I imagine that as long as they’re not used this should be fine.

As for the missing sensors data, since this appears to be a missing feature of all the distributions I’ve looked at (Proxmox, FreeNAS and OpenMediaVault) I’ll just deal with it and keep an eye on that information through IPMI.

Small update, found another 1TB disk and re-created the raidz1 pool as a 5 disk raidz2 (3 original disks + the new disk + the broken WD disk) to see what would happen (note that for “real” use 6 disks is the suggested minimum for raidz2).

Pool creation was slower than before (no errors or warnings though) as was copying a couple gigabytes of data onto the pool.

FreeNAS then hung on shutdown and went into a bootloop on reboot. Taking out the broken disk allowed it to come up properly in degraded state.

Slightly bummed about the shutdown hang and the bootloop. Will need to do some more digging to figure out if this is normal/expected behaviour.

Found a bunch of cheap new 1TB WD RE4 disks that were probably a vendor cleaning up old stock, so those are now burning in.

Doing the burn-in a bit more thoroughly than usual since the package got mishandled quite badly by the postal service (if they put tape on it saying “whoops we damaged your package” you know it’s bad :wink: )

I’ve also ordered an Intel RES2SV240 SAS expander since I found a really cheap one on Ebay, meaning the two PERC H310 cards can be replaced by a single one. The extra I might use in a NAS build for my parents.

Got hold of an LSI HBA, this saves me from going through the flashing dance with the Dell PERCs, of which, I’ve found out, there are two variants: there’s ones that come from workstations, and ones that come from servers.

For the server version Dell already provides an IT mode firmware, however not for the workstation variant, and the server firmware refuses to flash onto the workstation card (despite them being likely entirely identical otherwise). I additionally ran into some FreeDOS trouble when trying to flash them to LSI firmware (boot from USB, read only errors…) and since I got an actual HBA cheap I just dropped the flashing route.

The two PERC310’s I’ll be using in Linux systems where driver support is apparently better since OMV had absolutely no problem seeing the SMART data on the attached drives.

Note that the fact that I could replug the backplane into the LSI HBA with the entire array immediately being picked up by FreeNAS, confirms that the PERC310s actually do true JBOD and don’t mess with the disks as some RAID controllers do (when set up for non-RAID, of course). So the reason to flash them appears to be FreeNAS/FreeBSD driver limitations.

So current setup is:

  • Intel RES2SV240 SAS expander, attached to a
  • IBM 45W9122 HBA (LSI-9201-8i)

SMART data coming through nicely now in FreeNAS. Only “issue” I ran into is that due to the port placement on the expander and the SAS cable lenghts the order of my backplanes is now inverted, which threw me off at first.

As you have tested many of these, what do you think about freenas?
I’ve tried it long time ago and i find it so slow on boot, webui and not usable by low spec hardware (4 cores apu at 3.9ghz) 4G ram. actually I tested proxmox in same time it was faster at boot ( by minutes) and had not many issues, except it req. better hardware as well. my needs are nas/emby/nextcloud

The RAM is likely the problem, Gula is using older hardware than what you have (dual quad core 2Ghz Xeons, no hyperthreading), except for it having 32GB of ECC FB RAM, most of it happily gets eaten as a ZFS buffer when doing large transfers. Note that FreeNAS devs do list 8Gb as the minimum, using less likely isn’t optimal…

Now after I’ve just racked the system it’s already using 1.6GB for services and another 1.1GB as ZFS buffer, so that wouldn’t leave much room on a 4GB system…

Boot times appear to depend on ZFS pool sizes, I have an 8 disk 8TB raidz2 pool (that is 8 x 1TB) and it has definitely slowed down boot times due to the pool import at boot.

Haven’t gotten around to trying Proxmox (might go on another system), so can’t speak for that one, but OMV was way sluggisher in the UI department than FreeNAS is, to the point that it got annoying.

As for comparing boot times, well can’t say I paid that much attention to them, neither seemed so much faster than the other that it bothered me though.

Will keep an eye on transfer performance once I have my main server on the same switch and I start moving large amounts of data around.

I have a similar use case as you. I tossed out the idea of FreeNAS as it’s finicky and picky on hardware as well being a BSD. That left me with 2 options for first class support ZFS. Those options I found were Proxmox, which has KVM, docker, and LXC support which was amazing for VM’s and the like, and great ZFS support out of the box. I ended up going with Gentoo at a later point because I have problems with systemd breaking, but if you don’t mind a little systemd I can strongly recommend proxmox.

Note of my “BSD hate”, I run a proxmox router and love it. It also ran great under proxmox until I got a separate machine for my docker and VM’s.

I have noticed some issues with the newer FreeNAS webUI. I do sometimes have to use the legacy webUI to get things working. No service issues and no issues accessing the physical box. Other than that FreeNAS has been running solid 24/7 for almost 450 days since last install. Only needed to be rebooted after updates.

Disclosure: I have been running a FreeNAS VM in ESXI 6.5 for quite a while… I have an old R510 that runs FreeNAS but it is version 9.10.2 I believe. I have no idea how newer versions run on bare metal.

This. FreeNAS boot times are slow because BSD boot sequence is very very conservative. Try booting FreeBSD direct on your fastest hardware and it will likely take just as long. The upsides are it will run on a potato and once it is switched on, it stays on.

It also dies gracefully, so if you have a power outage or your mobo starts generating magic smoke, your OS will fire right back up on reboot or even on another system. I’ve done this.

I’m still running FreeNAS pools built 5+ years ago on a USB thumbdrive install of freenas. It didn’t start on the same hardware, not even same architecture. First build was on Amd phenom, then haswell, then Xeon on x79. Each migration was literally plug and play (with several copies of paranoia backups before hand).

I’ve upgraded the OS multiple times and it has never compromised my workflow. Compare that to Linux which loses its bananas when I update the wrong packages (admittedly my fault for running Manjaro).

Final point, FreeNAS is backed by a commercial entity. iXsystems have a vested interest in maintaining their product.

(might re-purpose this topic as a general homelab setup log thingy).

As an update 3 of the 4 Samsung Spinpoint F3 drives failed on me so far. Two of them I replaced with WD RE4 because the SMART stats were such that I wasn’t too confident in them anymore.

The third one seemed OK after a scrub and then just suddenly died (as in: controller knew it was there, but couldn’t even read SMART data off of it anymore). Had to order a new WD Red to replace it with (since I used my spares on the two other failures…)

And I had to buy an adapter too since the WD Red 1TB I got was a 2.5" device. Wasn’t even aware those were a thing, so yeah, whoopsie.

Unless I’m very mistaken these Spinpoints predate the acquisition by Seagate, so “Seagate bad, lol” doesn’t quite apply :wink: But yes, they were old, and desktop drives.

Installed a Mellanox ConnectX card as well for later use as either 10Gbit or a 40Gbit (Infiniband) backbone for the rack. Haven’t quite decided which route I will go with that just yet. Only real reason to go Infiniband would be, well, because it’s something different from ethernet, certainly not because I need the speed.

Now I just need to sort out cabling from the router to the rack, currently I’m using a ghetto wireless bridge for connectivity which isn’t particularly reliable (randomly starts dropping internet traffic for some reason, usually sorted by turning it all off and on again, or just waiting and then it usually fixes itself, for a while).

Once that is in place I can start working on the OPNSense box I also got to lift the router/firewall duties off of my current server, and then move that latter server into the rack as well.

Since this is rather an old system and they are notorious for high power consumption I did some tests. Initially the system came with 2 * Xeon E5335, which I then upgraded to 2 * Xeon L5410.

All values given in Watt.

CPU Off Boot (max) Idle
Xeon E5335 16 415 338
Xeon L5410 16 364 306

Off is basically just the PSUs plugged in, so this is the loss of the PSU combined with the BMC, shouldn’t exactly be surprising that replacing the CPU had no effect.

Boot (max) is the maximum consumption measured during boot.

And Idle is, well, when it had just been sitting there for a while.

As an interesting aside, the fans I replaced so far saved 44.34 Watt from peak consumption (not measured, number based on their specifications).

As mentioned before, I haven’t replaced the 80mm central (fan bar) fans yet, mostly due to RAM cooling concerns, the 60mm exhaust fans I replaced with Noctuas (which was the only real option I was able to find) are a hell of a lot weaker than stock, so I figure if I replace the fans that push the air over the RAM with weaker ones as well I might run into problems.

Since the 80mm fans are not excessively loud, nor use exorbitant amounts of power (both unlike the stock 60mm fans…) I can live with them being there.

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.