GIgabyte Server Activity Corner -- Proxmox, Docker and Config Notes!


Introduction

Our goal is to setup a reasonable platform for development and/or homelab virtualization and experiments. Proxmox is the subject of this how-to, but there are other guides for things like VMWare ESXi and videos that I’ve done.

Proxmox is a competent virtualization platform, based on Debian, with good paid support offered and a reasonable free-ish version of the platform you can download and learn. It uses LXC containers for lightweight “virtualization” and, of course, full-fat VMs. It has good support for easily configuring SR-IOV, PCIe Passthrough, the ZFS file system and mixing storage.

There are many guides and resources for setting up Proxmox out there. This one isn’t the best one, but I stumbled over enough “hmm, that’s weird no one mentioned that” while following some of those guides I thought it wise to add my own notes here.

Our Lab Setup

For the video, and this guide, I’m using the GIGABYTE G242-Z11 server. This is a great 2u server platform for GPU compute or even mixed-workload because it offers NVMe, 3.5" SATA and lots of internal PCIe .

As configured, it is an Epyc 7420P 24-core with 128gb ram (8*16gb dimms).

We have a 3x 4tb NVMe array combined with a 4x8tb WD Red array – both of which are configured with the ZFS filesystem.

Our overall goal for this system:

  • Run some development paravirtual GPU workloads for ongoing SR-IOV experiments,
  • Test system for Looking Glass experiments
  • Test bed for unlocked Nvidia grid functionality
  • Test bed for gvt-g (eventually? no hardware currently)
  • Test bed for dev/programing/ci-cd tests for future videos
  • Test bed for git automation/web hooks. GOGS from google is pretty swanky
  • Go microservices host
  • Video on Portainer sandbox

With the ZFS file sytstem, we can do snapshots and expose snapshots of the file system like as if they were native Windows shadow copy snapshots. This enables great cross-platform flexibility. There can be some gotchas with the “native” .zfs snapshot folder support as well.

I’ve read about issues with ZFS performance and Docker with the overlay file system. So far, knock on wood, performance here is pretty good from simply enabling the ZFS awareness of docker.

Docker?! But why!

Proxmox doesn’t do Docker out of the box. I think. You have some general options

  • Docker in a full-fat VM (nested virtualization, essentially)
  • Docker in an LXC Container
  • Docker on the host

For this guide, Docker means Docker-CE – the community edition of Docker. Docker isn’t the only containerization/orchestration system in town, either, and competitors are gaining rapidly in both technical competency and ease-of-use.

I am not sure that I would recommend running Docker on the Proxmox host. This really opens up a lot of cross-discipline security issues. In a production environment, be sure you understand the full implications and nuances of mixing solutions here.

I strongly do not recommend your Proxmox, Docker, or Portainer admin interfaces to be accessible from the public internet even in a homelab/development scenario.

**For this guide, we are installing Docker on the host. This is the least secure option! But best performance. **

Proxmox Configuration Tweaks

The first egregious thing I encountered was that boost frequencies on the cpu were not enabled. Whisky-Tapdancing-Tango-Foxtrot, system builders?

uname -a
Linux pm9 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100) x86_64 GNU/Linux
[email protected]:/etc/default# cat /sys/devices/system/cpu/cpufreq/boost
0

This is nothing specific to proxmox. Many developers working on distros simply do not properly understand that boost frequencies are a part of every modern processor, supported and should be enabled by default. Yet, here we see, they aren’t. That’s one of the reasons I like installing the cpufreq GNOME extension so you can see if the distro has some hilariously silly default setup that’s killing your performance (looking at you, irqbalance).


# Well, let's install some utilities to help us...
apt install linux-cpupower cpufrequtils 

There is also the question of the performance governor. Most people will prefer to run the ondemand performance governor. It’s reasonable and works well. The problem with it, I have found, is that sometimes if your workload is bursty the CPUs will sleep at inopportune times and slow things down. If you can afford the extra electricity cost performance performance governor is nice.

sudo cpupower -c all frequency-set -g performance

Output on my system (one block of output like this for each core)

analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.50 GHz - 2.80 GHz
  available frequency steps:  2.80 GHz, 2.40 GHz, 1.50 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 1.50 GHz and 2.80 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: 2.80 GHz (asserted by call to hardware)
  boost state support:
    Supported: yes
    Active: yes
    Boost States: 0
    Total States: 3
    Pstate-P0:  2800MHz
    Pstate-P1:  2400MHz
    Pstate-P2:  1500MHz

Make sure that you have “performance” or “ondemand” and “Boost > Active: Yes” in your output, like the above. Boost was “No” on my system. What a sad panda that makes me!

Now, the next part, is we need to make this persist across a reboot. We’ll make a custom systemd service.

First, because we installed cpufrequtils it should have it’s own systemd service now:

# systemctl status cpufrequtils

cpufrequtils.service - LSB: set CPUFreq kernel parameters
   Loaded: loaded (/etc/init.d/cpufrequtils; generated)
   Active: active (exited) since Sat 2021-01-16 16:22:30 EST; 41min ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 0 (limit: 7372)
   Memory: 0B
   CGroup: /system.slice/cpufrequtils.service

Jan 16 16:22:30 pm9 systemd[1]: Starting LSB: set CPUFreq kernel parameters...
Jan 16 16:22:30 pm9 cpufrequtils[38101]: CPUFreq Utilities: Setting ondemand CPUFreq governor...CPU0...CPU1...CPU2...CPU3...CPU4...CPU5...CP
Jan 16 16:22:30 pm9 systemd[1]: Started LSB: set CPUFreq kernel parameters.


That looks reasonable! Ofc it’s set to ondemand – let’s change to performance :slight_smile:

Edit: vi /etc/init.d/cpufrequtils
(*This is a bit anachronistic because… init.d … that’s what came before systemd! It’s not really the init system. And yet the systemd service calls this script! *

Zip down to the governor line and change ondemand to performance if that’s your preference. Ondemand is “fine” I just want the extra performance. Mostly you don’t really need to do this, unless you specifically know you’re on one of those edge cases where ondemand does weird stuff.

You can use the Phoronix Test Suite to do performance testing before/after, too, to confirm perf uplift.

I would hope you’re wondering about something from the above cpufreq output:
available frequency steps: 2.80 GHz, 2.40 GHz, 1.50 GHz

Even though Boost: Yes is showing, it’s still saying it tops out at 2.8ghz not 3.35ghz. What gives? It’s just how things are shown with this tool. If you run something in another terminal and re-run to check the frequency, you’ll see higher frequencies on at least some of the cores.

# cat /proc/cpuinfo 
...
cpu MHz         : 3322.548  
...

That’s a nice bump over the previous cap of 2.8Ghz! And it’s important to understand. This is not an overclock. This is literally how it was designed to work 24/7

This is a lot of words. I’m sorry for that. The governor is set, but not the boost. If you use command from a while back in this guide to check boost after a reboot, the boost is no longer boosting.

I don’t know of a more elegant way to make that stick other than creating a custom systemd service. I’m so, so sorry for that.

Creating a systemd service to enable turbo boost

Create a script to enable turbo (this is not strictly necessary since we’re just running one command HOWEVER you’ll thank me if you end up using this service to dump other tweaks that disappear on reboot and don’t have another more elegant spot that they can live.)

Create a file at /usr/local/bin/enable-turbo.sh with these contents and chmod +x /usr/local/bin/enable-turbo.sh to make it executable.

#!/bin/sh
echo 1 >  /sys/devices/system/cpu/cpufreq/boost

Create a file at /etc/systemd/system/enable-turbo.service with this contents

[Unit]
Description=Enable CPU Turbo Boost
After=network.target
StartLimitIntervalSec=0

[Service]
Type=oneshot
ExecStart=/usr/local/bin/enable-turbo.sh


[Install]
WantedBy=multi-user.target

Reload systemd, enable the service, and reboot:

systemctl daemon-reload

systemctl enable turbo-boost

No errors with that, hopefully?! :smiley:

After rebooting and reconnecting to the Proxmox console, you can issue cat /sys/devices/system/cpu/cpufreq/boost to verify boost is working. It should be 1 for enabled, or 0 for disabled.

Man, all those words to get a reasonable out-of-box default. Truly, I am sorry. But it’s fixed forever and should survive system upgrades for many years which should be some consolation.

On to Docker!

Let’s Install Docker on Proxmox, with ZFS, and good performance!

We are installing docker-ce on the host, and Portainer to help manage containers.

This is for dev only. I think. See, you really have to have a deep understanding of all the systems involved to understand what sort of risks you’re opening up yourself to by going off script. LXC containers, by default (imho) have better host isolation than Docker, for example.

And Docker inside a Virtual Machine or LXC container can’t really get at some of the ZFS features it would be nice to be able to access from inside the container.

So, we run it on the host. The main feature, for the types of things that I do, that is the nicest is this:

ZFS Caching : ZFS caches disk blocks in a memory structure called the adaptive replacement cache (ARC). The Single Copy ARC feature of ZFS allows a single cached copy of a block to be shared by multiple clones of a With this feature, multiple running containers can share a single copy of a cached block. This feature makes ZFS a good option for PaaS and other high-density use cases.

So on my setup I have the default rpool that Proxmox gives you, and I have added ssdpool at /ssdpool.

I am adding my docker storage on the SSDs for now.

zfs create -o mountpoint=/var/lib/docker ssdpool/docker-root
zfs create -o mountpoint=/var/lib/docker/volumes ssdpool/docker-volumes 

zfs list will show you the mounted ZFS stuff around your filesystem. It’s nice.

We’ll want to disable auto-snapshotting on the root vut enable it on the volumes:

zfs set com.sun:auto-snapshot=false ssdpool/docker-root 
zfs set com.sun:auto-snapshot=true ssdpool/docker-volumes

And finally, according to the ZFS storage driver doc above we need to add the storage driver as ZFS. You can also set a quota

edit or create /etc/docker/daemon.json

{   
     "storage-driver": "zfs"
} 

Actually Setup Docker

Now we can install docker CE. Follow their docs, or cheatsheet:

apt install apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -

create etc/apt/sources.list.d/docker.list

deb [arch=amd64] https://download.docker.com/linux/debian buster stable

Finally, install amd test

apt update
apt install docker-ce docker-ce-cli containerd.io

docker run hello-world 

Portainer, The Container Porter. That’s not what it means. I’m just making stuff up.

zfs create ssdpool/docker-volumes/portainer_data
docker volume create portainer_data
docker run -d -p 8000:8000 -p 9000:9000 --name=portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce

don’t forget to to go http://yourip.example.com:9000 and set a password (!!!)

And you’re up and running with Portainer, and ZFS, and Docker! On the host. With a decent gui for manging the containers.

TODO

Other Proxmox Quirks

As I mentioned in the video, this is the staging platform for other experiments like VDI with nvidia and the S7150. When you’re running a windows 10 VM the performance can be weird and slow. Might I suggest huge pages?

There are some other threads on our forum that dive a little more into the setup of that. Do some experiments; your mileage may vary.

Is one such thread – worth an honorable mention!

11 Likes

You may want to revisit that one.

1 Like

Wow. Thanks for Mentioning my Thread here. Hopefully someone can help with that! Great Tutorial BTW :slight_smile:

double check your turbo. It can also be slower than you’d think with ZFS as the backing store for storage. If your disk array is really fast, out of the box, you’ve got a 2x-3x multiplier for that speed putting pressure on memory bandwidth. e.g. 2gb/sec disk is nothing = 5-6gb/sec ZFS uses memory bandwidth.

Generally I try to pass through nvme storage to the VM for best performance. It seems like windows is turned with certain assumptions about storage in mind.

Wendell

2 Likes

Its not that Fast, as its “only” a ZFS Mirror on 2 M.2 Sata drives. Crucial MX500 to be exact. Turbo seems to work, at least according to the Output from i7z. In Proxmox i ticked the “SSD” Option for Storage and enabled Discard, like on all other VMs. If i Passthrough a Sata SSD to the VM, its painfully slow w/ around 40MB/s. But i mean the real Problem is RAM Bandwith, not storage Performance, at least not if i Use the RAW Image on the ZFS Mirror.

I just saw the video and thought “Cool! I should check that!” My systems don’t have /sys/devices/system/cpu/cpufreq/boost

Is it because they’re Xeons?

Old xeons? Kernel version?

One is an E3-1230 v5 and the other is an E3-1230 V2; Dell PowerEdge t130 and t110ii respectively. uname says:

Linux t130 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100) x86_64 GNU/Linux

and

Linux t110ii.bogus.domain 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100) x86_64 GNU/Linux

By the way, I have no complaints about performance; just trying to follow along so to speak.

should have turbo. do a bit of googling to find out if turbo is on. i7z package may be useful?

i7z reports “TURBO ENABLED” on both systems. and looking at the attached screenshot makes be believe that. Report say max freq 3507 without turbo and real freqs are 3634 (for example). Don’t know why the descrepancy in /sys/.

1 Like

@wendell What are the AICs are you using to mount your u.2 drive?

I’ve got a couple of the Startech ones and they’ve been good but (edit very) occasionally flaky. My experience is be that they wouldn’t be recognized during boot and then not show up until a reboot where everything would be fine. Thoughts?

Wendell,
I loved the review on the Gigabyte G-242. I prefer that form factor. I went to scope out prices and wow. Pricey for this one. Is there a similar form factor that you know of in a 2U that could do a good job housing a TrueNas box? It’s hard for me to choose hardware as I do less of that and more software trouble-shooting and db dev and admin’ing. Plus I come from an Apple path no MS and actually looking to move more towards a linux system for myself. So basically I am hardware challenged. I watched that vid you did where you picked up some retired enterprise drive bay and thought “Wow, what a great idea, too bad I don’t know how to find half of that stuff on ebay.” I have no idea what I am looking for.

@wendell

Shouldn’t this be:
systemctl enable enable-turbo
?

5 Likes

Boost and Performance seems to be enabled by default on my 2700.

lscpu

root:~# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       43 bits physical, 48 bits virtual
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               8
Model name:          AMD Ryzen 7 2700 Eight-Core Processor
Stepping:            2
CPU MHz:             2399.280
CPU max MHz:         3200.0000
CPU min MHz:         1550.0000
BogoMIPS:            6387.26
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

boost

root:~# cat /sys/devices/system/cpu/cpufreq/boost 
1

cpupower

root:~# cpupower -c 0 frequency-info
analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1.55 GHz - 3.20 GHz
  available frequency steps:  3.20 GHz, 2.80 GHz, 1.55 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 1.55 GHz and 3.20 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: 3.20 GHz (asserted by call to hardware)
  boost state support:
    Supported: yes
    Active: yes
    Boost States: 0
    Total States: 3
    Pstate-P0:  3200MHz
    Pstate-P1:  2800MHz
    Pstate-P2:  1550MHz

IIRC For Intel you’ll want:

cat /sys/devices/system/cpu/intel_pstate/no_turbo

Should be 0 for Turbo Boost enabled (double negative - 0 = no_turbo off)

2 Likes

Thank you that was helpful since I’ve got Intel server with Intel Xeon E-2236

I watched the YouTube video. What do you mean when you say that Apple users have unrealistic expectations?

It has been my experience that when safari or some other apple product misbehaves as a result of poor choices by apple that it is anyone but apple’s fault.

Recently, for example, had a customer throw a tantrum because when rotating their iphone, and back, it would add extra margin to the web page. This was, of course, a bug in the browser. It added extra margin for the notch, then forgot to take it away.
That actually had an issue on the browser bug tracker. And the explanation of “yeah, bugs happen. Doesn’t make sense to spend engineering time trying to work around something that will ultimately be fixed without us doing anything in this ecosystem anyway” but they still wanted to be mad about something. I was fortunate I could dig up an issue tracker for that as that isn’t always the case and sometimes work-arounds are warranted.

They never understand that Safari is kind of becoming the IE6 of web browsers. Browsers have bugs.

Also, apple’s app development policies and the arbitaryness. “When will my app publish?” "idk, whenever apple blesses it. could be a day. could be a month! who knows! but expect to have to change the icons 5 pixels or change the shade of gray from #cccto #bbb or some other arbitary thing. That apple could be so random or arbitrary seems incredulous…

4 Likes

What about huge pages?

Can you provide some more information on how you used this server to automate mobile device testing? Could you go into more detail on the software side of things?