MegaRAID 9560-16i, super slow random 4k in linux / esxi, fast in Windows

hhh23 · February 15, 2022, 1:39pm

Hi,
I made a new server build initially for ESXi but it’s driving me crazy with slow rand4k read performance. The hardware is
AMD Ryzen 5900X
ASRock Rack X570D4U-2L2T
Kingston 32gbx2 ECC (KSM32ED8/32ME) on Memory QVL list
Broadcom MegaRAID 9560-16i PCIe 4.0 RAID controller
ICY DOCK ToughArmor MB720MK-B V2
Samsung PM9A3 (2 disks in RAID1 mode)

All traffic through the RAID card is SLOW in Ubuntu and ESXI, but fast in Windows 2022.
All drivers are updated for mobo and raid card and OS. Traffic to NVME bootdrives are fine in both Linux and Ubuntu / ESXi.
I’m mostly interested in READ speed because of database performance.

fio under Ubuntu

sudo fio --filename=/dev/sda --direct=1 --rw=randread --bs=4k --size=4g --ioengine=posixaio --iodepth=1 --runtime=30 --time_based --group_reporting --name=iops-test-job --readonly
iops-test-job: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=22.1MiB/s][r=5651 IOPS][eta 00m:00s]
iops-test-job: (groupid=0, jobs=1): err= 0: pid=1675: Tue Feb 15 13:41:27 2022
  read: IOPS=5646, BW=22.1MiB/s (23.1MB/s)(662MiB/30001msec)
    slat (nsec): min=410, max=90470, avg=1348.97, stdev=288.25
    clat (usec): min=103, max=721, avg=175.30, stdev=22.66
     lat (usec): min=105, max=722, avg=176.64, stdev=22.66
    clat percentiles (usec):
     |  1.00th=[  165],  5.00th=[  167], 10.00th=[  169], 20.00th=[  169],
     | 30.00th=[  172], 40.00th=[  172], 50.00th=[  174], 60.00th=[  174],
     | 70.00th=[  174], 80.00th=[  176], 90.00th=[  178], 95.00th=[  180],
     | 99.00th=[  343], 99.50th=[  351], 99.90th=[  359], 99.95th=[  359],
     | 99.99th=[  363]
   bw (  KiB/s): min=22488, max=22728, per=99.99%, avg=22585.29, stdev=60.58, samples=59
   iops        : min= 5622, max= 5682, avg=5646.27, stdev=15.16, samples=59
  lat (usec)   : 250=98.31%, 500=1.69%, 750=0.01%
  cpu          : usr=1.68%, sys=1.98%, ctx=169433, majf=0, minf=43
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=169413,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=22.1MiB/s (23.1MB/s), 22.1MiB/s-22.1MiB/s (23.1MB/s-23.1MB/s), io=662MiB (694MB), run=30001-30001msec

Disk stats (read/write):
  sda: ios=168826/0, merge=0/0, ticks=27860/0, in_queue=0, util=99.73%

fio under Windows

C:\Users\Administrator>fio.exe --name=baseline --rw=randread --direct=1 --size=4g --iodepth=1 --bs=4k --ioengine=windowsaio --filename=\\.\PhysicalDrive1 --time_based --runtime=30
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
baseline: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=windowsaio, iodepth=1
fio-3.27
Starting 1 thread
Jobs: 1 (f=1): [r(1)][100.0%][r=65.1MiB/s][r=16.7k IOPS][eta 00m:00s]
baseline: (groupid=0, jobs=1): err= 0: pid=4672: Tue Feb 15 03:43:32 2022
  read: IOPS=16.7k, BW=65.4MiB/s (68.6MB/s)(1962MiB/30001msec)
    slat (nsec): min=2234, max=278125, avg=2507.26, stdev=832.63
    clat (usec): min=4, max=2076, avg=56.73, stdev=18.54
     lat (usec): min=19, max=2082, avg=59.24, stdev=18.59
    clat percentiles (usec):
     |  1.00th=[   19],  5.00th=[   43], 10.00th=[   44], 20.00th=[   44],
     | 30.00th=[   46], 40.00th=[   47], 50.00th=[   64], 60.00th=[   65],
     | 70.00th=[   65], 80.00th=[   67], 90.00th=[   78], 95.00th=[   78],
     | 99.00th=[   79], 99.50th=[   80], 99.90th=[   85], 99.95th=[  103],
     | 99.99th=[  215]
   bw (  KiB/s): min=65808, max=67547, per=100.00%, avg=67032.86, stdev=384.55, samples=59
   iops        : min=16452, max=16886, avg=16757.83, stdev=96.14, samples=59
  lat (usec)   : 10=0.01%, 20=2.14%, 50=46.73%, 100=51.07%, 250=0.04%
  lat (usec)   : 500=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%
  cpu          : usr=0.00%, sys=3.33%, ctx=0, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=502397,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=65.4MiB/s (68.6MB/s), 65.4MiB/s-65.4MiB/s (68.6MB/s-68.6MB/s), io=1962MiB (2058MB), run=30001-30001msec

In order to bugfix the server is in dual boot with Ubuntu and Windows server. Both fresh vanilla installations. If i boot in Windows the raid is fast, Ubuntu its slow.
I tried changing a bunch of stuff in the bios.
I tried installing VM SQL and a search in 50secs in Linux and 5 secs in windows, so the fio results must be accurate. Crystaldiskmark and ATTO shows similar results.

I’m going crazy here, any help will be greatly appreciated.

hhh23 · February 15, 2022, 7:41pm

Looks like the PCIe connection is 4.0 x8, so it’s not the issue.
I can’t help but think a setting in the BIOS will fix everything, but I’m all out of ideas.

sudo lspci -vvv -s 2b:
2b:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
Subsystem: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 72
Region 0: Memory at 7ffff00000 (64-bit, prefetchable) [size=1M]
Region 2: Memory at 7fffe00000 (64-bit, prefetchable) [size=1M]
Region 4: Memory at fc100000 (32-bit, non-prefetchable) [size=1M]
Region 5: I/O ports at f000 [size=256]
Expansion ROM at fc000000 [disabled] [size=1M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x8 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Power Budgeting <?>
Capabilities: [158 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [168 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [188 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [1b0 v1] Lane Margining at the Receiver <?>
Capabilities: [248 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [348 v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [380 v1] Data Link Feature <?>
Kernel driver in use: megaraid_sas
Kernel modules: megaraid_sas

gordonthree · February 16, 2022, 3:28am

Sorry this isn’t super helpful, but that’s a very new and high end card. Have you tried reaching out to the manufacturer or OEM?

Looking at their support page, they seem to want folks to run their proprietary driver instead of whatever the Kernel provides, and Ubuntu isn’t on the list of supported operating systems.

For the sake of comparing apples to apples, what results do you see running CentOS 7 and their proprietary driver? You know Windows is going to run the proprietary driver so…

hhh23 · February 16, 2022, 5:33pm

Thanks for the suggestions. I’ll try installing CentOS next.
Ubuntu is supported, and i did build the supplied kernel drivers but it did not result in better performance.
I just try to move the MegaRAID controller card to a X1700 (PCIe 3.0) and suddenly I’m getting much better results on a freshly installed Ubuntu (even without updated megaraid driver.

sudo fio --filename=/mnt/test/testfile --direct=1 --rw=randread --bs=4k --size=16g --ioengine=posixaio --iodepth=1 --runtime=30 --numjobs=1 --time_based --name=iops
iops: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=47.5MiB/s][r=12.2k IOPS][eta 00m:00s]
iops: (groupid=0, jobs=1): err= 0: pid=1332: Wed Feb 16 16:11:06 2022
  read: IOPS=12.3k, BW=48.1MiB/s (50.4MB/s)(1442MiB/30001msec)
    slat (nsec): min=861, max=88546, avg=2275.06, stdev=369.55
    clat (usec): min=70, max=345, avg=78.14, stdev= 5.10
     lat (usec): min=71, max=347, avg=80.41, stdev= 5.20
    clat percentiles (usec):
     |  1.00th=[   72],  5.00th=[   73], 10.00th=[   74], 20.00th=[   75],
     | 30.00th=[   75], 40.00th=[   76], 50.00th=[   77], 60.00th=[   77],
     | 70.00th=[   83], 80.00th=[   85], 90.00th=[   86], 95.00th=[   87],
     | 99.00th=[   88], 99.50th=[   89], 99.90th=[   91], 99.95th=[   93],
     | 99.99th=[  104]
   bw (  KiB/s): min=48184, max=51400, per=100.00%, avg=49277.97, stdev=1077.21, samples=59
   iops        : min=12046, max=12850, avg=12319.49, stdev=269.30, samples=59
  lat (usec)   : 100=99.98%, 250=0.02%, 500=0.01%
  cpu          : usr=4.62%, sys=4.83%, ctx=369303, majf=0, minf=25
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=369277,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=48.1MiB/s (50.4MB/s), 48.1MiB/s-48.1MiB/s (50.4MB/s-50.4MB/s), io=1442MiB (1513MB), run=30001-30001msec

Disk stats (read/write):
  sda: ios=367942/5, merge=0/0, ticks=24006/0, in_queue=24006, util=99.75%

Unfortunately my other B550 ITX motherboard would not boot with the raid controller and no GFX, so i’m not able to nail it down to the motherboard or Linux just yet.
I’m beginning to suspect the ASRock Rack X570D4U-2L2T as the bios / IPMI seems very half baked.
Any suggestions will be greatly appreciated

hhh23 · February 16, 2022, 9:10pm

I get “Kernel panic - fatal exception” trying to install CentOs just after the installer boots. I can see others having issues with CentOS and X570 chipset. I installed Proxmox, but it’s slow as well

gordonthree · February 16, 2022, 9:13pm

Sorry I don’t have more suggestions for you. The latest LSI tech I’ve worked with is my 9207 HBA, maybe ten years older than the card you have

All I can think of is maybe the Windows numbers aren’t accurate, since Windows is harder to prevent the kernel and OS from tampering with things to “optimize” them?

oO.o · February 16, 2022, 10:54pm

@hhh23 what kernel are you running in Ubuntu? New hardware will want newer kernels.

Is it officially supported by ESXi? Also, what version of ESXi are you using?

hhh23 · February 16, 2022, 11:24pm

I tried ESXi 6.7 and 7.0U3, Proxmox 7.1, Ubuntu server 20.04.3 LTS and 21.10. All with equally slow results

Installing Windows on the current system and moving the raid controller card to my old old Ryzen 1700X/B350 system gave me good results in Ubuntu.
Compatibility with X570, the specific motherboard ASRock Rack X570D4U-2L2T or AMD Ryzen zen 3 might be the issue. Unfortunately i can’t force PCI3.0 in the BIOS to test that out.

hhh23 · February 16, 2022, 11:42pm

Yeah, the megaraid card is on the VMware HCL list and I installed recommended VIB driver package on ESXi 7.0 U3c. The rest of the “server” is not officially WMware supported.

oO.o · February 17, 2022, 12:07am

That’s a bummer, esp about ESXi, but yeah probably a mobo issue since the card is HCL. I expect that the Linux situation will improve as the hardware support matures though.

hhh23 · February 17, 2022, 8:13am

Yeah, and the odd thing is the MegaRAID card is slow in a Windows VM under ESXi in passthrough mode. I don’t really know if that information makes it more clear where the problem is.

D_G · March 21, 2022, 9:23pm

i had similar problems with slow read and could fix it with kernel 5.17. even kmod from broadcom didnt helped under linux. I didnt investigated further whats the difference in 5.17 that fixes the problem.

Nick.Klein · September 5, 2022, 4:32pm

Did anyone get resolution on this without moving to 5.17? I’m having a similar problem on another controller with the same chip. Kernel 4.18.0-372 with 8x Dell DC NVMe PE8010 RI SSDs in RAID10 behind this thing.

[root@asdf ~]# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=iotest --filename=/home/io_tmp --bs=4k --iodepth=64 --size=5G --readwrite=randwrite | grep iops
   iops        : min=78746, max=111404, avg=89159.45, stdev=10326.61, samples=29
[root@asdf ~]# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=iotest --filename=/home/io_tmp --bs=4k --iodepth=64 --size=5G --readwrite=randread | grep iops
   iops        : min=207128, max=208231, avg=207799.25, stdev=348.77, samples=12
[root@asdf ~]#

This seems to be off by about an order of magnitude.

E-waste · September 6, 2022, 9:48pm

Do you have “nested paging” enabled?

When loading ramdisk / kernel, it takes a good 15-20 seconds without nested paging enabled.

Less than four seconds and possibly boots faster, with “nested pages”. Just my cent from using VBox.

Nick.Klein · September 9, 2022, 8:15pm

We’re not in a VM. This is bare metal. Does simply having this feature enabled affect a host?

system · June 10, 2023, 2:15pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.