NetApp DS4243 (IOM6) restarts/loses disks when linux boots up ("device_block" and "device_unblock")

Hey. Been using the DS4243 with an IOM6 for 1.5 years now. When I got the unit, I immediately hit this bug, without knowing what it really was. I thought it was because I was only running 1 PSU, so I switched to 2 and it was fine ever since.

Except last week, I was fiddling with some BIOS settings and did several reboots (10-20), and on the final reboot when everything else was “fixed”, the shelf acted up, causing the zpool to suspend.

I’ve updated the IOM6 firmware from 0191 to 0211, didn’t help.

Today I hit it twice in a row. I’ve now switched to manual zpool imports so it doesn’t automatically suspend if something breaks on reboot.

I found this 3 Pools on Netapp DS4246 Causing Issues - TrueNAS General - TrueNAS Community Forums but basically there’s no solution other than ditching the disk shelf. I’m thinking about getting a SC846 and a 9305-24i.

Any other ideas before I spend 500€ on new hardware? I’ve attached a dmesg from when it fails.
dmesg.txt (257.8 KB)

Relevant hardware:
5955WX | Asrock WRX80 Creator R2.0 | 8x32GB 3200MHz ECC UDIMM | LSI 9200-8e | dual 10Gtek QSFP-SFF8088 cables (happened with 1 cable too, dual cables are for performance hunting)

[  716.363522] ses 17:0:16:0: attempting task abort!scmd(0x000000006fbff1a0), outstanding for 30525 ms & timeout 30000 ms
[  716.374218] ses 17:0:16:0: tag#3241 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.381349] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.389425] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.401934] ses 17:0:16:0: task abort: SUCCESS scmd(0x000000006fbff1a0)
[  716.408563] ses 17:0:16:0: attempting task abort!scmd(0x0000000021d4d299), outstanding for 30572 ms & timeout 30000 ms
[  716.419255] ses 17:0:16:0: tag#3240 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.426388] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.434475] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.444167] ses 17:0:16:0: task abort: SUCCESS scmd(0x0000000021d4d299)
[  716.450793] ses 17:0:16:0: attempting task abort!scmd(0x00000000ec0d8deb), outstanding for 30613 ms & timeout 30000 ms
[  716.461487] ses 17:0:16:0: tag#3028 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.468625] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.476706] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.487508] ses 17:0:16:0: task abort: SUCCESS scmd(0x00000000ec0d8deb)
[  716.494129] ses 17:0:16:0: attempting task abort!scmd(0x00000000d1d91717), outstanding for 30660 ms & timeout 30000 ms
[  716.504821] ses 17:0:16:0: tag#3027 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.511957] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.520037] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.529683] ses 17:0:16:0: task abort: SUCCESS scmd(0x00000000d1d91717)
[  716.536305] ses 17:0:16:0: attempting task abort!scmd(0x0000000032c4c978), outstanding for 30698 ms & timeout 30000 ms
[  716.546994] ses 17:0:16:0: tag#2475 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.554124] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.562206] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.573401] ses 17:0:16:0: task abort: SUCCESS scmd(0x0000000032c4c978)
[  716.580017] ses 17:0:16:0: attempting task abort!scmd(0x0000000055ccd5e8), outstanding for 30746 ms & timeout 30000 ms
[  716.590711] ses 17:0:16:0: tag#1503 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.597839] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.605920] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.615531] ses 17:0:16:0: task abort: SUCCESS scmd(0x0000000055ccd5e8)
[  716.622150] ses 17:0:16:0: attempting task abort!scmd(0x00000000f1d10732), outstanding for 30783 ms & timeout 30000 ms
[  716.632835] ses 17:0:16:0: tag#876 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.639880] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.647959] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.659362] ses 17:0:16:0: task abort: SUCCESS scmd(0x00000000f1d10732)
[  716.665975] ses 17:0:16:0: attempting task abort!scmd(0x00000000e5a46535), outstanding for 30828 ms & timeout 30000 ms
[  716.676661] ses 17:0:16:0: tag#875 CDB: Receive Diagnostic 1c 01 02 01 6c 00
[  716.683705] scsi target17:0:16: handle(0x002d), sas_address(0x500a0980076646be), phy(36)
[  716.691787] scsi target17:0:16: enclosure logical id(0x50050cc10204983c), slot(0)
[  716.702205] ses 17:0:16:0: task abort: SUCCESS scmd(0x00000000e5a46535)

on this reboot I didn’t automatically import and nothing broke, but as I imported manually, I got these timeout errors in my syslog - perhaps they indicate something useful? not sure, really. sas_address(0x500a0980076646be) is the IOM6 controller.

Have you replaced the IOM6 module yet? They’re pretty cheap on ebay (link)

HTH!

Not yet, I’ve considered it. It’s only 10€ each + 17€ shipping from Germany, but that’s 30-40€ wasted if it doesn’t help, 10% of what I would spend on new hardware!

It seems more like a software bug rather than hardware issue, since the controller is fine in 24/7 heavy usage (scrubs etc), this happens only when Linux is booting up and importing the zpool. Which would also imply a firmware issue that wasn’t fixed in the latest firmware, leaving me no choice but to ditch the shelf.

LSI 9305 errors filling log | TrueNAS Community more technical talk here, but again, no solutions. Looks like it’s SC846 time.

Hey, I am one of the posters from the Truenas thread. Just wanted to let you know swapping the controllers to Dell branded ones did not work for me. It may for you since you are using an older Netapp but there is an incompatibility with the controller and newer PSUs. It did not work for my DS4246 or the 2 DS2246 that I am using. I really like Truenas and have been using it for years, but at this point I’m between testing other storage-based OSs to try and get compatibility back or shelling out a thousand dollars on supermicro shelves.