MegaRAID 9560-16i, super slow random 4k in linux / esxi, fast in Windows

I made a new server build initially for ESXi but it’s driving me crazy with slow rand4k read performance. The hardware is
AMD Ryzen 5900X
ASRock Rack X570D4U-2L2T
Kingston 32gbx2 ECC (KSM32ED8/32ME) on Memory QVL list
Broadcom MegaRAID 9560-16i PCIe 4.0 RAID controller
ICY DOCK ToughArmor MB720MK-B V2
Samsung PM9A3 (2 disks in RAID1 mode)

All traffic through the RAID card is SLOW in Ubuntu and ESXI, but fast in Windows 2022.
All drivers are updated for mobo and raid card and OS. Traffic to NVME bootdrives are fine in both Linux and Ubuntu / ESXi.
I’m mostly interested in READ speed because of database performance.

fio under Ubuntu

sudo fio --filename=/dev/sda --direct=1 --rw=randread --bs=4k --size=4g --ioengine=posixaio --iodepth=1 --runtime=30 --time_based --group_reporting --name=iops-test-job --readonly
iops-test-job: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=22.1MiB/s][r=5651 IOPS][eta 00m:00s]
iops-test-job: (groupid=0, jobs=1): err= 0: pid=1675: Tue Feb 15 13:41:27 2022
  read: IOPS=5646, BW=22.1MiB/s (23.1MB/s)(662MiB/30001msec)
    slat (nsec): min=410, max=90470, avg=1348.97, stdev=288.25
    clat (usec): min=103, max=721, avg=175.30, stdev=22.66
     lat (usec): min=105, max=722, avg=176.64, stdev=22.66
    clat percentiles (usec):
     |  1.00th=[  165],  5.00th=[  167], 10.00th=[  169], 20.00th=[  169],
     | 30.00th=[  172], 40.00th=[  172], 50.00th=[  174], 60.00th=[  174],
     | 70.00th=[  174], 80.00th=[  176], 90.00th=[  178], 95.00th=[  180],
     | 99.00th=[  343], 99.50th=[  351], 99.90th=[  359], 99.95th=[  359],
     | 99.99th=[  363]
   bw (  KiB/s): min=22488, max=22728, per=99.99%, avg=22585.29, stdev=60.58, samples=59
   iops        : min= 5622, max= 5682, avg=5646.27, stdev=15.16, samples=59
  lat (usec)   : 250=98.31%, 500=1.69%, 750=0.01%
  cpu          : usr=1.68%, sys=1.98%, ctx=169433, majf=0, minf=43
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=169413,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=22.1MiB/s (23.1MB/s), 22.1MiB/s-22.1MiB/s (23.1MB/s-23.1MB/s), io=662MiB (694MB), run=30001-30001msec

Disk stats (read/write):
  sda: ios=168826/0, merge=0/0, ticks=27860/0, in_queue=0, util=99.73%

fio under Windows

C:\Users\Administrator>fio.exe --name=baseline --rw=randread --direct=1 --size=4g --iodepth=1 --bs=4k --ioengine=windowsaio --filename=\\.\PhysicalDrive1 --time_based --runtime=30
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
baseline: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=windowsaio, iodepth=1
Starting 1 thread
Jobs: 1 (f=1): [r(1)][100.0%][r=65.1MiB/s][r=16.7k IOPS][eta 00m:00s]
baseline: (groupid=0, jobs=1): err= 0: pid=4672: Tue Feb 15 03:43:32 2022
  read: IOPS=16.7k, BW=65.4MiB/s (68.6MB/s)(1962MiB/30001msec)
    slat (nsec): min=2234, max=278125, avg=2507.26, stdev=832.63
    clat (usec): min=4, max=2076, avg=56.73, stdev=18.54
     lat (usec): min=19, max=2082, avg=59.24, stdev=18.59
    clat percentiles (usec):
     |  1.00th=[   19],  5.00th=[   43], 10.00th=[   44], 20.00th=[   44],
     | 30.00th=[   46], 40.00th=[   47], 50.00th=[   64], 60.00th=[   65],
     | 70.00th=[   65], 80.00th=[   67], 90.00th=[   78], 95.00th=[   78],
     | 99.00th=[   79], 99.50th=[   80], 99.90th=[   85], 99.95th=[  103],
     | 99.99th=[  215]
   bw (  KiB/s): min=65808, max=67547, per=100.00%, avg=67032.86, stdev=384.55, samples=59
   iops        : min=16452, max=16886, avg=16757.83, stdev=96.14, samples=59
  lat (usec)   : 10=0.01%, 20=2.14%, 50=46.73%, 100=51.07%, 250=0.04%
  lat (usec)   : 500=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%
  cpu          : usr=0.00%, sys=3.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=502397,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=65.4MiB/s (68.6MB/s), 65.4MiB/s-65.4MiB/s (68.6MB/s-68.6MB/s), io=1962MiB (2058MB), run=30001-30001msec

In order to bugfix the server is in dual boot with Ubuntu and Windows server. Both fresh vanilla installations. If i boot in Windows the raid is fast, Ubuntu its slow.
I tried changing a bunch of stuff in the bios.
I tried installing VM SQL and a search in 50secs in Linux and 5 secs in windows, so the fio results must be accurate. Crystaldiskmark and ATTO shows similar results.

I’m going crazy here, any help will be greatly appreciated.

Looks like the PCIe connection is 4.0 x8, so it’s not the issue.
I can’t help but think a setting in the BIOS will fix everything, but I’m all out of ideas.

sudo lspci -vvv -s 2b:
2b:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
Subsystem: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 72
Region 0: Memory at 7ffff00000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at 7fffe00000 (64-bit, prefetchable) [size=1M]
Region 4: Memory at fc100000 (32-bit, non-prefetchable) [size=1M]
Region 5: I/O ports at f000 [size=256]
Expansion ROM at fc000000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Power Budgeting <?>
Capabilities: [158 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [168 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [188 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [1b0 v1] Lane Margining at the Receiver <?>
Capabilities: [248 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [348 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [380 v1] Data Link Feature <?>
Kernel driver in use: megaraid_sas
Kernel modules: megaraid_sas

Sorry this isn’t super helpful, but that’s a very new and high end card. Have you tried reaching out to the manufacturer or OEM?

Looking at their support page, they seem to want folks to run their proprietary driver instead of whatever the Kernel provides, and Ubuntu isn’t on the list of supported operating systems.

For the sake of comparing apples to apples, what results do you see running CentOS 7 and their proprietary driver? You know Windows is going to run the proprietary driver so…

Thanks for the suggestions. I’ll try installing CentOS next.
Ubuntu is supported, and i did build the supplied kernel drivers but it did not result in better performance.
I just try to move the MegaRAID controller card to a X1700 (PCIe 3.0) and suddenly I’m getting much better results on a freshly installed Ubuntu (even without updated megaraid driver.

sudo fio --filename=/mnt/test/testfile --direct=1 --rw=randread --bs=4k --size=16g --ioengine=posixaio --iodepth=1 --runtime=30 --numjobs=1 --time_based --name=iops
iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=47.5MiB/s][r=12.2k IOPS][eta 00m:00s]
iops: (groupid=0, jobs=1): err= 0: pid=1332: Wed Feb 16 16:11:06 2022
  read: IOPS=12.3k, BW=48.1MiB/s (50.4MB/s)(1442MiB/30001msec)
    slat (nsec): min=861, max=88546, avg=2275.06, stdev=369.55
    clat (usec): min=70, max=345, avg=78.14, stdev= 5.10
     lat (usec): min=71, max=347, avg=80.41, stdev= 5.20
    clat percentiles (usec):
     |  1.00th=[   72],  5.00th=[   73], 10.00th=[   74], 20.00th=[   75],
     | 30.00th=[   75], 40.00th=[   76], 50.00th=[   77], 60.00th=[   77],
     | 70.00th=[   83], 80.00th=[   85], 90.00th=[   86], 95.00th=[   87],
     | 99.00th=[   88], 99.50th=[   89], 99.90th=[   91], 99.95th=[   93],
     | 99.99th=[  104]
   bw (  KiB/s): min=48184, max=51400, per=100.00%, avg=49277.97, stdev=1077.21, samples=59
   iops        : min=12046, max=12850, avg=12319.49, stdev=269.30, samples=59
  lat (usec)   : 100=99.98%, 250=0.02%, 500=0.01%
  cpu          : usr=4.62%, sys=4.83%, ctx=369303, majf=0, minf=25
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=369277,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=48.1MiB/s (50.4MB/s), 48.1MiB/s-48.1MiB/s (50.4MB/s-50.4MB/s), io=1442MiB (1513MB), run=30001-30001msec

Disk stats (read/write):
  sda: ios=367942/5, merge=0/0, ticks=24006/0, in_queue=24006, util=99.75%

Unfortunately my other B550 ITX motherboard would not boot with the raid controller and no GFX, so i’m not able to nail it down to the motherboard or Linux just yet.
I’m beginning to suspect the ASRock Rack X570D4U-2L2T as the bios / IPMI seems very half baked.
Any suggestions will be greatly appreciated

I get “Kernel panic - fatal exception” trying to install CentOs just after the installer boots. I can see others having issues with CentOS and X570 chipset. I installed Proxmox, but it’s slow as well :frowning:

Sorry I don’t have more suggestions for you. The latest LSI tech I’ve worked with is my 9207 HBA, maybe ten years older than the card you have :joy:

All I can think of is maybe the Windows numbers aren’t accurate, since Windows is harder to prevent the kernel and OS from tampering with things to “optimize” them?

@hhh23 what kernel are you running in Ubuntu? New hardware will want newer kernels.

Is it officially supported by ESXi? Also, what version of ESXi are you using?

I tried ESXi 6.7 and 7.0U3, Proxmox 7.1, Ubuntu server 20.04.3 LTS and 21.10. All with equally slow results :frowning:

Installing Windows on the current system and moving the raid controller card to my old old Ryzen 1700X/B350 system gave me good results in Ubuntu.
Compatibility with X570, the specific motherboard ASRock Rack X570D4U-2L2T or AMD Ryzen zen 3 might be the issue. Unfortunately i can’t force PCI3.0 in the BIOS to test that out.

1 Like

Yeah, the megaraid card is on the VMware HCL list and I installed recommended VIB driver package on ESXi 7.0 U3c. The rest of the “server” is not officially WMware supported.

1 Like

That’s a bummer, esp about ESXi, but yeah probably a mobo issue since the card is HCL. I expect that the Linux situation will improve as the hardware support matures though.

Yeah, and the odd thing is the MegaRAID card is slow in a Windows VM under ESXi in passthrough mode. I don’t really know if that information makes it more clear where the problem is.

1 Like

i had similar problems with slow read and could fix it with kernel 5.17. even kmod from broadcom didnt helped under linux. I didnt investigated further whats the difference in 5.17 that fixes the problem.

1 Like

Did anyone get resolution on this without moving to 5.17? I’m having a similar problem on another controller with the same chip. Kernel 4.18.0-372 with 8x Dell DC NVMe PE8010 RI SSDs in RAID10 behind this thing.

[root@asdf ~]# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=iotest --filename=/home/io_tmp --bs=4k --iodepth=64 --size=5G --readwrite=randwrite | grep iops
   iops        : min=78746, max=111404, avg=89159.45, stdev=10326.61, samples=29
[root@asdf ~]# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=iotest --filename=/home/io_tmp --bs=4k --iodepth=64 --size=5G --readwrite=randread | grep iops
   iops        : min=207128, max=208231, avg=207799.25, stdev=348.77, samples=12
[root@asdf ~]#

This seems to be off by about an order of magnitude.

Do you have “nested paging” enabled?

When loading ramdisk / kernel, it takes a good 15-20 seconds without nested paging enabled.

Less than four seconds and possibly boots faster, with “nested pages”. Just my cent from using VBox.

We’re not in a VM. This is bare metal. Does simply having this feature enabled affect a host?

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.