[SOLVED] Going for optimum performance, but the people of the software I am running start out hisitant

Perhaps this forum is more appropriate. I tried starting on the proxmox forum as that is what is currently running on the hardware.

In a nutshell. How to go for maximum performance using;
(original post @ https://forum.proxmox.com/threads/going-for-max-speed-with-proxmox-7-how-to-do-it.93477)

I’d like to setup proxmox 7 to be as fast as it can possibly be with the hardware I have and that I am considering to get;

EDIT: (this will be for non critical, non server related workloads)
Edit 2:
I would like to have a dedicated VM to pass GPU’s to so that I can donate to the folding at home or other distributed science projects

The already haves;

  • AMD Ryzen Threadripper 3990X (CPU (64 core - 128 threads))
  • Gigabyte trx-40 aorus extreme (rev1.0) (Mobo with fully updated BIOS)
  • AORUS Gen4 AIC Adaptor (1 PCI device for 4 x NVMe SSD)
  • 1 x evga rtx 2080 ti (GPU)
  • 3 x Gigabyte rtx 2060 (GPU)
  • 2 x Samsung 980 Pro 1TB (NVMe SSD)
  • 4 x WD_BLACK 1TB SN850 (NVMe SSD)
  • 4 x 32GB DDR4-2600MHZ ECC (memory)

considering to buy;

  • 8 x Kingston Technology 32GB DDR4-3200MHZ ECC CL22 DIMM 2RX8 MICRON E

I truly do not care how long it takes for proxmox to boot. All I am after that once it is up and running I can fire up a ton load of VM’s in nano seconds.
And the performance of those VM’s should be second to none.

ZFS is not a requirement as I will be making constant backups of VM’s.

Any tips one could share?

If you’re looking for speed, why not drop ecc and go higher speed?

Everything I see on that thread is solid advice.

1 Like

In the other thread the advice led me to believe again that ZFS is a good way to go. Am considering to create 3 extra mirrors and stripe those to the current root mirror.
On second though I like redundancy better than no redundancy thus I am leaning again towards ZFS.
One additional thought I had for ECC is that running science computation workloads like BOINC might benefit. But I am not sure about that as it is mostly the GPU that does the number crunching.

current zpool status;
zpool status
pool: rpool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 00:36:04 with 0 errors on Sun Jul 11 01:00:05 2021
config:

    NAME                                 STATE     READ WRITE CKSUM
    rpool                                ONLINE       0     0     0
      mirror-0                           ONLINE       0     0     0
        nvme-eui.002538b9015063be-part3  ONLINE       0     0     0
        nvme-eui.002538b9015063e1-part3  ONLINE       0     0     0

errors: No known data errors

what would be more optimal in terms of random read/writes?
simply adding new 2, or 4 or 6 nvme’s to the mirror-0 or create extra sets of mirrors each containing 2 nvme’s?

Also will it matter for either scenario if the nvme’s are not the same brand and model? They are the same size, in terms of what is marketed (1TB), though

I ended up placing 2 nvm’e in the 2 left over m.2 mobo slots.
I also reconfigured the pcie configuration with slot one being 4x4x4x4 (as otherwise I will not see 4 disks on the ACI adaptor)
Added a gigabyte GPU on slot 2
A evga GPU on slot 3 (and configured the bios to use that one as primary display)
Added a gigabyte GPU on slot 4

Boom, out of resources (d4) BIOS led code error. LOL and I thought that this mobo was EXTREME :wink: Or it is stupidity on my part which is always an option :frowning:

So I removed the GPU from slot 2 and now no more recourses issues.
Now I can’t boot as it hangs on
“EFI stub: loaded initrd from command line option”

Any ideas how to proceed?

Not sure about the efi, but I would double check all the power sockets on the motherboard are populated, as in any 8pin “optional” connectors are attached?

I am sure everything is connected properly. I have ran this kind of setup before but now the only difference is that the 2 left over m.2 sockets are also populated and I moved around the PCIe cards (GPU, ACI) a bit as to allow for better air flow.

Anyways, I got impatient and reinstalled the hypervisor (this time is is verion 7 of proxmox).
It went happily through the installation and I selected as installation options the first 2 samsung ssd’s in zfs raid1. (just like last time (I did see the other 4 wd black ssds though but in the other thread it came to my attention to better seperate vm storage from os storage))

And now it hangs again :wink:

cannot import ‘rpool’ ,pre than one matching pool
Import by numeric id instead
Error: 1

Manually import the pool and exit.

I forgot that I have already used some of the wd black ssds once to try and install proxmox and ran into the same issue. So how to I clean them out and reinstall proxmox 7?

Securely deleting NVME SSDs:

apt-get install nvme-cli

nvme format --ses=1 /dev/nvme0nX

1 Like

by using an ubuntu live usb for example?

or

could I create a proxmox install using all disks and then later remove the wd black’s and create a new raid 10 pool for vm’s

right?

Exactly!

Depends on your setup. You can remove devices from a mirror, you can’t remove devices from a stripe.

Modern memory encounters errors very infrequently. Unless you’re extremely sensitive to errors that might happen once a year, if that, you should be fine.

interesting results

root@pve:~# zpool status

pool: rpool
state: ONLINE
config:

    NAME                                                 STATE     READ WRITE CKSUM
    rpool                                                ONLINE       0     0     0
      mirror-0                                           ONLINE       0     0     0
        nvme-eui.002538b9015063be-part3                  ONLINE       0     0     0
        nvme-eui.002538b9015063e1-part3                  ONLINE       0     0     0
      mirror-1                                           ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b492a6acc-part3  ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b49df8c91-part3  ONLINE       0     0     0
      mirror-2                                           ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b49df8f93-part3  ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b49df19b5-part3  ONLINE       0     0     0

errors: No known data errors
root@pve:~# pveperf
CPU BOGOMIPS: 742465.28
REGEX/SECOND: 3994828
HD SIZE: 2697.00 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 1198.36

previously
FSYNCS/SECOND: 281.94

big increase for sure.
Is it worth trying a 6 way mirror and see what happens or can one already guess it wont be faster?

1 Like

You mean 6 single drive vdevs instead of a 3way mirror? Try it.

Worst case you restore from backup from the other machine. You still get checksums and snapshots.

results for a 6 way mirror are in;
zpool status
pool: rpool
state: ONLINE
config:

    NAME                                                 STATE     READ WRITE CKSUM
    rpool                                                ONLINE       0     0     0
      mirror-0                                           ONLINE       0     0     0
        nvme-eui.002538b9015063be-part3                  ONLINE       0     0     0
        nvme-eui.002538b9015063e1-part3                  ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b492a6acc-part3  ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b49df8c91-part3  ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b49df8f93-part3  ONLINE       0     0     0
        nvme-eui.e8238fa6bf530001001b448b49df19b5-part3  ONLINE       0     0     0

errors: No known data errors
root@pve-trx:~# pveperf
CPU BOGOMIPS: 742467.84
REGEX/SECOND: 3958451
HD SIZE: 899.00 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 523.78

the abysmal speed could be due to the fact 2 of the disks are in the slow m.2 mobo slots and the rest is being bogged down to match that speed?

No that’s just a mirror, no mirror.

zpool create rpool /dev/nvme*part3 ... and whatever other flags , no mirror. Basically 6way raid0.

currently I got this;

Code:

pveperf
CPU BOGOMIPS:      742462.72
REGEX/SECOND:      4060522
HD SIZE:           899.00 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     532.78

OS running on 2 samsung 980 pro ssd’s in the slow m.2 sockets on the mobo. Not sure why I am getting double the speed now as before.

Code:

zpool status
  pool: rpool
 state: ONLINE
config:

        NAME                                 STATE     READ WRITE CKSUM
        rpool                                ONLINE       0     0     0
          mirror-0                           ONLINE       0     0     0
            nvme-eui.002538b9015063be-part3  ONLINE       0     0     0
            nvme-eui.002538b9015063e1-part3  ONLINE       0     0     0

errors: No known data errors

  pool: vmpool
 state: ONLINE
config:

        NAME                                      STATE     READ WRITE CKSUM
        vmpool                                    ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            nvme-WDS100T1X0E-00AFY0_204540802523  ONLINE       0     0     0
            nvme-WDS100T1X0E-00AFY0_204540802590  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            nvme-WDS100T1X0E-00AFY0_20465F800961  ONLINE       0     0     0
            nvme-WDS100T1X0E-00AFY0_204540802025  ONLINE       0     0     0

!! WOW !!

Code:

pveperf /vmpool
CPU BOGOMIPS:      742462.72
REGEX/SECOND:      4013371
HD SIZE:           1725.80 GB (vmpool)
FSYNCS/SECOND:     3501.27

3501.27 sweeeeeeeeeeeeeeeeeeeeeeet

thanks all for the input. I am ecstatic at the moment

1 Like

@diversity I know you’ve marked this as solved, and I followed your post on the Proxmox forums as well, but for fun, I just ran pveperf myself.

CPU BOGOMIPS:      166435.52
REGEX/SECOND:      1778499
HD SIZE:           1352.74 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND:     4848.87

While my CPUs aren’t as exciting, I got that FSYNC number on 2 SATA enterprise SSDs, this model: Samsung MZ7LM1T9HMJP

ZFS mirror

Just something to consider if you want to look outside of nvme exclusively. I myself am considering passing through a nvme m.2 into one specific machine just for fun to test the performance, but these enterprise grade 2.5" SSDs have been surprisingly performant.

Edit: Running pveperf specifically on /dev/rpool instead of just pveperf:

pveperf /dev/rpool
CPU BOGOMIPS:      166435.52
REGEX/SECOND:      1954940
HD SIZE:           62.90 GB (udev)
FSYNCS/SECOND:     53898.56

!! syco !!
Please tell me more on what board you have.
also please share your zpool status