ZFS heat soak

Hi there,

I have a dual 2011v2 server, with an LSI flashed (HBA) raid card driving a 24 drive netapp disk shelf, with one of those qsfp > mini-sas cables. So I have 24 ssd (small). They’re fairly new. I started coping VMs to the ZFS pool, and suddenly I started getting zfs errors, and maybe some kernel panics - not sure about kernel issues.

I saw when Wendel helped Bitwit (Kyle), he mentioned those raid cards get super hot. I opened my case, gave it an extra space between the next card, and boosted my fan profile to 100%. And poof, it seems my errors went away.

However, one one of these server, it seems like they’re heat soaking. I run DD tests; first test I crank out 800m/s, second test 200 less. All subsequent tests fall to about 300 MB/s. Maybe heat soak?

The first time I did my DD test I used this block size

dd bs=1M count=10000 if=/dev/zero of=test conv=fdatasync

That’s where the tests fall.

Then i tried a different block size

dd if=/dev/zero of=/tmp/output bs=8k count=100k;

and with that second option, I have slightly less throughput, because my block size is small (makes sense to me), but I don’t get that potential heat soak issue.

Any ideas?
Again, I’m using a 24 drive, 2 pool raidz2. I didn’t change the block size at all. BTW, Im planning on using this for proxmox, and mino object storage, so any advising tuning for raw vm images, would be advisable; that’s how I discovered the errors.

If the HBA is getting too hot, then the only thing you can do is put a 40-50mm fan on it.

Take off the heat sink, drill two holes in diagonal corners and then use twist ties to secure the fan. This also has the bonus of give you a good reason to clean off and replace the old thermal compound (which is likely old as fuck).

Another potential reason for errors is bad cables/connection.

1 Like

@Log

If I were to buy a new sas/sata3 hba, any recommendations?

Something that can handle a little bit of heat in the summer. We have air, but still.

I’ve heard good things about those high-point rocket cards

What kind do you have now?

To be honest I don’t think a change in cards will help with heat issues. This is server stuff meant for controlled temperature and lots of airflow.

I run my HBA’s with a fan attached as described above, and haven’t had issues since.

LSI SAS9207-8e flashed into hba/it mode

Try laying a 120mm fan on top of the HBAs if there is space, blow extra air across the cards. Works for me.

I just got an old dell h310 and was SHOCKED at how hot it got just putting the file systems on the drives and doing a couple test transfers. Next time it comes down it’s getting some thermal grizzly and maybe an old pentium fan.

While I like kryonaut as the gold standard, it needs to be replaced every 2-3 years, and may have issues with sustained high temp applications. For long term “set and forget” applications, I prefer IC diamond which I have never noticed performance degradation on, and it’s performance is generally very close to kryonaut.

I have a server with 3 fans, and 5000+ rpm, full bloringness…
I have two of those cards, on two servers. I have one hdd 5 drive pool - no problems. But on the other end of holiness scale, I have those 24 drive sas expanders, with ssd. I think the ssd iops are just overwealming this puny card.

They all heat up a lot. In fact, it’s the leading cause of failure. Those puny heatsinks can’t handle it, at the very least slap a fan on in.

Personally, I’d replace the thermal paste with something good (like Arctic MX4 or w/e), maybe even find a bigger heatsink that would fit and add a fan.

1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.