Petabytes? No Big Deal These days -- Quick start with the Supermicro 947HE1C-R2K05 90 Bay JBOD Disk Shelf

TODO Video Link

Introduction - Adventure Time

image

In collaboration with server part deals,

image

check out this awesome → 90 bay disk shelf
from supermicro . 90 bays! 28.8 gigabytes/sec interface throughput!

Overview

This is a disk shelf. It is a modern (for 2025) disk shelf. If you followed our adventures for a long time, you know that we love/loved the NetApp disk shelves because they were so standard, so easy to get parts for and so bulletproof. The problem, in 2025, is they are kind of old now. They have a lot of mileage. They are kinda slow, even if you find upgraded IOM components.

This shelf has none of those problems and plenty of room to grow. Whereas the netapp might still make sense for home lab… this THIS is a 10 year solution.

Why disk shelf and not a server with a lot of bays? Longevity and upgradability. This disk shelf can live through two? three? four? major server upgrades. It’s initial role can be archival, online bulk storage… then as it ages out it can be re-used for nearline or backup storage. In this way businesses can enjoy a double lifetime out of it.
They are popular, and they are easy to get parts for.

The Setup

This has 90 bays but only 24 SAS 1.2 channels. Each channel can manage 1.2 gigabytes/sec and the total bandwidth the chassis could possibly support is 28.8 gigabytes per second. As a practical matter the usable bandwidth is much lower.

The configuration in the video has a single IO module, but it is possible to configure both dual path and dual I/O modules.

Even though this is a JBOD it has its own IPMI/out of band management. Manging 90 drives across just 24 SAS channels represents more than a 3:1 SAS mux and that can have bandwidth and cabling implications.

Zone ID: 3 is what we used for the video configuraiton. To give you an idea of the possibilities:

Firmware

It is critically important to get updated expander firmware, and IPMI firmware

  1. www.supermicro.com - /wdl/Firmware/JBOD/SC947S_SC947H Top Load JBOD/Expander/

  2. www.supermicro.com - /wdl/Firmware/JBOD/SC947S_SC947H Top Load JBOD/IPMI/

Special thanks to Jake from LTT – the first time I worked on one of these was a while back with Jake from LTT and he had some good notes about what they ran into when they were using it and Iwas able to apply it here.

IPMI

storage mode 3 for max performance, mode 1 is the default. Mode 0 used to be auto, but this jbod was bad at Doing The Right Thing consistently. So it requires manual selection.

NOTE: After changing zoning you MUST pull both power cords for 5+ minutes to be sure the zoning change takes effect.

Fans and Sensors via IPMI / snmp

The JBOD does support different zoning configurations, but this one makes the most sense. And based on the zoning diagram and how many SAS channels you’re using back to your host, you can decide on whether you want to spread drives across channels (“Zones”) or if you want to minimize the number of connections back to the host you need.

ZFS Benchmarks

Let’s get wild and crazy with vdevs! Spoiler: DRAID makes the most sense with this many drives, probably. Pro Tip Go ahead and designate at least one or two hot spares for ZFS. That way if a drive dies, a spare is immediately available. If you want to run without spares please reserve at least one empty drive slot. As a general best-practice I always recommend adding a new drive before removing any malfunctioning drives from the array.

As you might imagine the three “zones” for disk configuration gives us some options when we want to maximize safety or speed. If you’re especially interested in safety/redundancy, I’d recommend picking up another IO module as well as more expanders to add internally.

And if 2.5pb+ per shelf isn’t enough for you… it is possible to deploy a setup 7.5 pb in a single system with 3 JBOD. Goodness!

TODO paste here from the video

6 Likes

Thanks! Those petabyte JBODs racks are very impressive!

Wendell, if you get a chance, I’d like to see a video of (much) more pedestrian JBOD enclosures. Would be of interest for someone like me. I am about to outgrow my external 8 Bay HDD enclosure (attached through USB 3) - puny, I know :grinning:.
@manufacturers and suppliers: Send Wendell some of your 8-16 Bay HDD enclosures and maybe he’ll review them!

2 Likes

that’s 4 drives per channel

300 MB/s per drive can over saturate every single head, high capacity CMR drive on the market, by 20%, which is perfect for rebuilds.

underrated when deploying what would have been 4x disk shelves a few years ago.

You only need 46x 22TB enterprise SAS drives (with redundancy) to hit a petabyte of storage in 4u

That’s only 69 drives with 2 parity per 4 data drives.

Add in the hot spares, able to rebuild themselves in 24 hours (our real world rebuild performance of 22TB drives is roughly 24 hours when given additional headroom).

That’s 1.2 PB of storage with redundancy in a single 4u chassis.

Wild, especially when 1TB of high density CMR storage is hovering around $20/TB

So, for around $40k out the door an enterprise can have 1.2 PB of storage locally.

1 Like

at the lowerend/homelab its still hard to beat netapp shelves as we’ve covered extensively. Other than that? 45drives chassis is a happy medium. they do up to 60 bays but I think 30 is the sweet spot. Unless you’re building a cluster then 15/30 bay case are great for clusters. And thats not just a disk shelf but they’re so standard and easy to work on with a “low” number of drives, its fine

3 Likes

Thx for sharing. This is the kind of content I’d like to see more of from the L1T team. Unfortunately, this kind of stuff is well beyond my means, but in years to come prices will come down and hopefully to an affordable level.

1 Like

Think of how many Danny DeVito DVDs I could fit on one of these bad boys…

5 Likes

Definitely hoping to see throughput numbers on a full shelf of these with all dual actuator drives.

THIS. This is what rackspace was meant to be filled with.

1 Like

I’m honestly pretty surprised by how quiet it sounded in the video. What’s the power consumption like without any drives in it? It would be interesting to me if the power consumption of the fans in there weren’t crazy- could see only partially populating the chassis and leaving room for future expansion.

I’ve come across other Supermicro JBODs over time- and with enough ingenuity you can make any SM chassis into one, really. I recently got my hands on a 44-bay chassis with a ton of SAS3 in/out connections. I have it connected to a 24 bay SM chassis! Obviously it takes it way more rack space though.

My thoughts? Build the ultimate NAS. Get an EPYC server with lots of lanes so you can fill it up with 24+ NVMe drives- then connect it to one of these many JBOD options. I’m sure with enough math, use of NVMe drives for special devs, memory, etc you could make something pretty neat. Make one super pool or have your NVMe and GigaTank- whatever you want I suppose.

NFS over RDMA coming in Fangtooth for TrueNAS. Hmmm.

1 Like

Surely that depends on which drives one chooses. Remember Nimbus Data with their 100TB SATA SSD drives? You only need 10 of those for a PB, so this chassis can hold 9PB, each. Extrapolate that and you’d only need 120 chassis for an Exabyte system, including all the spare drives. That’s just 15 42U racks as you’d also need to take in account UPS’s, networking and of course the head units (servers). Fortunately, those JBOD’s should be able to be daisy-chained meaning you’d only need a single head server per rack. Or a pair if you want High Availability access.

Imagine opening a terminal and looking into the massive void that is an empty EB system :exploding_head:

But I don’t wanna foot that particular bill :stuck_out_tongue: (easily 45m USD)

I will second @eastcoastpete in the request for more content like this.

I grabbed an 84-bay disk shelf from Dell and find it a bit challenging to get specific information that caters towards strategies (redundancy, performance, wiring, multi-connect, open source controllers).

I can read the manual, and do… but experienced users talking about strategies is much more valuable to me personally. Show me Why, the How becomes pedestrian.

Thanks for the Post and the Video @wendell !

2 Likes

Yo, long time L1 News fan, but this just made me want to post. I got the 60 bay SAS variant but with dual compute nodes since my rack isn’t that deep and I also need/want to replace a proprietary dual node solution. (or fall back to just using node and keep the other one as a spare I guess).

@Koop my SSG-640SP-DE2CR60 which uses the same fans as the JBOD is ridiculously loud on startup and has so far the strongest static pressure I’ve encountered. It does settle down, but you’ll know when it’s upset.
I’ve no numbers on without disks. but mine just 24 drivers it might land at around 2700W during duress, though I am unsure how much of that is the dual nodes computes and the drives. Ballpark estimate I think for just the shelf could be around 2200-2300W during a thermal event or during startup.

Anywho I might start lurking here some more, I almost had hoped @wendell had done ZFS in a dual head HA configuration in the vid and saved me a lot work, but maybe someone else on the Level 1 Forum already has done it.

The thing that I’d be most interested in, for a system like this, is if you have 128 parallel threads or even 256 parallel threads hitting the JBOD simultaneously, what does the CPU load averages look like?

I am asking because I am using a 36-bay/4U server right now, and under heavy I/O load with four raidz2 vdevs (one in one pool and three in the other pool), my current load average is 42.58.

So with 90 drives, and if you’re hitting it with a heavy load - I’d imagine that the CPU load average is going to suck.

This is the thing that proponents of ZFS don’t really talk about (the practical side of needing to balance out the number of CPU cores vs. the number of drives you have connected to the control node).

I’ve settled on 36 LFF bays in a 4U server.

If you are able to stick the server into a closet or garage somewhere, then the old SunFire X4600 systems might be an option as that’s a 48 bay server (not just JBOD/disk shelf).

Also, sidebar – it is interesting that his client, because they lost data to the cloud storage/backup provider, Backblaze, (i.e. they were find with off-prem data storage, e.g. cold as far as they’re otherwise concerned), that now they want hot storage instead of going with a tape library.

Do you plan on giving a single node ceph cluster a try? I’d be curious to see an apples to apples comparison, especially the CPU load you’ll get when running benchmarks

@wendell You said in the video “Backblaze lost some data in the cloud”. Can you provide source? What exactly happened?

I’m the source. they lost some data. did a post mortem. said oopsie we will try to do better. will do a video soon but there’s not much to it. they don’t do extra replicas or 3-2-1

3 Likes

Thanks. Yeah, that would be really nice. I imagine, it must be a human error or a software bug, because I don’t think they got more than 3 bad drives out of 20.

Separate story: I was listening to a podcast, from an engineer that worked at dropbox. They lost information, how to assemble large files (Backblaze calls those chunks “a tome”), but at the end they managed to get that information from log files, otherwise they would lose customer’s data, which would be a very bad look for a service like that.

Hi, I just purchased one of these at a great price, but it’s the version with two CPU nodes built in ( SSG-6049SP-DE1CR90). Unfortunately, it was barebones and so no memory or CPU was included and hence I am waiting for those parts to arrive before I can power up the unit.

In the meantime, does anyone know how the two nodes are configured? Would only using one of the nodes allow me to access all 90 drives? I will likely install Windows as that’s what I’m familiar with. There’s very little info online about these units.

I was also confused by the associated video where @wendell said only three of the expander cards (the ones along the middle of the drive drawer) are needed for SATA drives. If I uses all SAS drives (sometimes they are cheaper than SATA), will I need all 6 expanders installed? Mine only came with 3 for some reason.

Thank you!

Hi, as an update, the two node an independent with each node only able to access half of the drives. Does anyone know if I can reconfigure this so that one of the nodes sees all 90 drives? Seems possible given the DE2 version with the same parts is configured such that each node can see the 90 drives. But very little documentation on the Supermicro website.

There’s your primary suspect. Win-OS is essentially a desktop OS and not really suited to this kind of hardware. Try a Linux Live-CD image (DVD/USB stick) like sysrescuecd or Knoppix to see if that improves things. If so, your choices are TrueNAS Scale, unRAID, Proxmox (in no particular order) instead of Win-OS. Research “NUMA-nodes” as I suspect those play a role here.