Proxmox ZFS on Linux performance subpar on Xeon 1518D

Hi,

This is my first time posting on the forums but that is obvious based on my post count :). I’ve been watching the levelonetechs Youtube channel for a while and decided to have a look
around these forums and noticed a lot of skilled and enthusiastic people here. Normally I am not really all that big on forums and the like but since my co-workers and friends are not really able to help me troubleshoot non-windows issues I’ve decided to branch out a bit. I came here hoping to find answers for my questions and hopefully in the future provide answers for other peoples questions as well.

So let’s start with my first problem, it has to do with a new Proxmox host I’ve built.
My storage systems for these past few year have all been based around ZFS after I had once experienced the joy of silent data corruption due to a flaky raid card. Since FreeNAS was lacking some features a few months ago I went with Proxmox.

The new Proxmox ZFS host is showing underwhelming ZFS performance numbers.
Hopefully someone here has a similar platform and can share some numbers.

The specs:

  • Supermicro X10SDV-4C-7TP4F-O (Xeon-D 1518)
  • 32 GB ecc ram
  • 250GB 960 evo m.2 boot drive
  • 6x 3TB Seagate NAS raidz-2
  • 2x 8TB WD red mirror
  • 2x 256 GB SSD stripe

I moved a the Seagate raidz-2 pool from my old amd x2 555be based system which ran with 16GB. Performance numbers for this pool are nearly the same with the newer system being slightly faster. It is the system responsiveness which is showing huge differences, whenever I load the pool cpu load settles somewhere between 2.0 and 3.0 this mainly consists of iowait.

What bothers me is that the striped SSD lab-pool (no redundancy or integrity requirements since it is test use only) shows underwhelming performance, this pool consist of two healthy! Crucial SSD’s which are a M4 256G and a mx100 256G. Sequential writes settle in between 100 and 200 MB/ps the individual drives perform well in excess of 100MB/ps for dd runs with 2x ram size. The drives should be faster however so something feels off to me. I am not expecting miracles, the 1518 is a low power chip but based on numbers seen on STH I had expected it to easily keep up with the ancient Phenom II x2 555.

I don’t think the bottleneck is purely cpu based since the raidz2 pool performance is okay while this should be slightly more taxing for the system. If lower load averages should be the norm can someone then point me to a good guide or write-up which can help me troubleshoot these issues? At the moment I ‘m grasping at straws and there are quite a few variables possibly at play here. What suprised me most is that when benching a pool of 2 striped SSD’s the cpu load is 4.0 system, the shell is still responsive though.

Basically I am wondering about the following:

  • What can I expect CPU wise from a Xeon-D 1518 based system?
    I expected the performance to easily keep up with my old AMD platform. I hope to easily saturate a 2 Gbps trunk (aggregate) for my labpool without redundancy.
  • What is a reliable way to benchmark zpools?
    dd is easily fooled with compression but I am unsure if the CPU loads generated and reported by Bonnie++ are reliable. I seem to recall bonnie++ not reporting properly on some multi core systems because it could lose track of a thread.
  • Could the system be suffering from a scheduling issue caused by hyper threading?

What I already did:

  • Test the nvme boot drive, which is running ext4 with lvm and is showing performance numbers beyond 1GBps.
  • Use raw devices instead of partlabels to have ZFS handle alignment
  • Tested pools with default ashift, ashift9, ashift12 and ashift13 there are differences but nothing decisive.

What I plan to do this week:

  • Update the firmware of the board and re-verify firmware and health for the SSD’s
  • Verify power settings in the boards efi.
  • Test with a Freenas USB to verify performance on a FreeBSD based system.

Some numbers:
Testing methodology: create pool with ashift, compression off, then dd: 16, 32, 64, 32, 64, 16, 64, 32, 16 GB.
So nine runs, each size three times to get a baseline.

Drive + type ashift 16GB 32 GB 64GB
mx100 partlbl 0 309 MB/s 198 MB/s 309 MB/s
167 MB/s 203 MB/s 309 MB/s
164 MB/s 164 MB/s 309 MB/s
mx100 raw 0 244 MB/s 193 MB/s 309 MB/s
293 MB/s 206 MB/s 309 MB/s
154 MB/s 192 MB/s 309 MB/s
mx100 raw 13 280 MB/s 182 MB/s 309 MB/s
208 MB/s 221 MB/s 309 MB/s
180 MB/s 184 MB/s 309 MB/s
mx100 + M4 13 525 MB/s 447 MB/s 392 MB/s
476 MB/s 415 MB/s 385 MB/s
419 MB/s 418 MB/s 397 MB/s
Seagate Raidz2 0 361 MB/s 311 MB/s 317 MB/s
349 MB/s 329 MB/s 307 MB/s
354 MB/s 321 MB/s 300 MB/s

Okay, so this is all new hardware? What’s the model of SSD and how full is the pool?

That’s not abnormal.

How old are these devices? The most likely issue here is that ZFS on Linux does not support TRIM. I site my sources.

1 Like

Okay, so this is all new hardware? What’s the model of SSD and how full is the pool?

The SSD’s have been used before in one of my systems but the wear levels are low. I did a diskpart clean all before putting them in the proxmox machine.
The motherboard with onboard cpu, ram and 750W power supply are new as are the 8 TB WD red’s.

2x 256 GB SSD stripe

That’s not abnormal.

Hmm okay, I will see how it affects other tasks running on the server. In my experience high cpu load due to iowait can cause issues because the CPU cannot process other tasks.

How old are these devices? The most likely issue here is that ZFS on Linux does not support TRIM.

The M4 256GB is somewhat older I was somewhat aware of the ZoL trim support, that is why I planned to use these SSD’s for my LAB setup which would allow me to blkdiscard the drives after a test project is done. You might be onto something here, I will attach the drives to a windows system and force a trim just to be sure.
Might try to go back to Freenas entirely than since it can run everything I need through jails or their rancher implementation.

That’s definitely an option. I’ve been eagerly awaiting TRIM for ZoL for a while. I remember some activity a couple months ago, but nothing to really note.

1 Like

I meant to get back to this topic earlier last week but got sidetracked due to work projects.

Anyway I installed Freenas on a separate SSD and the results seem positive so far!
CPU load wise the systems 1m avg. peaks to about 3.0 but generally sits around 2.5 while running IO zone benchmarks with 64GB testsize.
The transfer rates were noticeably better though so the 5 min avg. CPU load decreased while performance increased up to 30 percent.

I thought the samsung 840Pro was flunky since I could not get it to reach transfer rates beyond 40MBps when testing with sizes beyond 16GB.
Ashift 9, 12 or 13 did not make a real difference.
That drive was pulled from a ESXi host and just formatted once so I think @SgtAwesomesauce was right that there was a trim related issue at play here.

During my freenas testing the crucial M4 locked up, it was apparently running older firmware.
In the end I have update the crucial drives and cleaned the samsung drive with magician performance was withing the expected range for all drives after that.
This morning I tested the drives again in proxmox, the drives were performing better there also but still noticeably slower then expected based on my results in Freenas.
Now my results aren’t really comparing apples to apples since I tested with different tools but the results do confirm my suspicion that my platform was not performing optimally when running Proxmox with ZoL.

The platform was certainly usable and I was quite content with Proxmox overall however FreeNAS is a better fit for this specific use case.
Based on my results I’ve decided to switch over to Freenas since the machines main purpose will be providing network storage and that is exactly what Freenas excels at.
Thanks again for the tips about TRIM.


IOzone numbers below, compression was disabled for these tests.
Stripe results were disappointing however these drives have different ashift preferences so the striped pool is really suboptimal and I’ve decided to go with 3 separate drives.

840 Pro 256 MX100 256 M4 256 STRIPE
Testsize KB 67108864 67108864 67108864 67108864
Reclen 4096 4096 4096 4096
Write 519352 3007143 3051365 2961509
Rewrite 505182 2973986 2995198 2925956
Read 519735 2942427 3004068 3001526
Reread 532485 2936720 3006039 2994909
Random read 476783 2962444 3014230 3003309
Random write 502764 2837508 2897519 2814456
Backward Read 454734 2897279 3062556 3082805
Record rewrite 4990451 4959075 5091667 5260797
Strided read 449335 2994153 3036237 3035047
file write 512369 2959855 2982958 2959317
file rewrite 498782 2976795 3004751 2955301
file read 517777 1872902 1883835 1886297
file reread 532190 1864537 1874962 1884497
2 Likes

Awesome report!

Glad you were able to nail down the issue with such accuracy. :smiley:

1 Like