Pfsense VM vs bare metal on old hardware

My single Haswell Xeon E3 debian 11 system is getting bullied by having to host all my docker stuff as well as pfsense in a VM (forbidden router, PCIE passthrough 1Gb NICs). I can’t tell for sure what the bottleneck is, but the load average on the host spikes to 4-8 when a container on that box pushes/pulls ~100Mbit/s or more to/from WAN to the zfs pool. It doesn’t appear to be disk I/O, as the disk_wait and total_wait latency reported by zpool iostat -vly 10 1 are only ~25 and ~100 ms on the slowest HDD while this is happening. Load average in pfsense only hits ~1, so it doesn’t seem to be a simple matter of pfsense being busy.

I have a Skylake Xeon E3 box I want to move some services over to to relieve the Haswell box. I am deciding between these for the Skylake box:

  • putting pfsense on bare metal
  • debian 12 with pfsense as a VM like I have on the Haswell box
    • This would let me use the spare PCIE slots and drives bays for other activities. My Haswell server has no slots or bays full and no iGP, so stuff like GPU acceleration is not possible right now.

I have symmetrical gigabit WAN and want to be able to get high utilization even with wireguard traffic from pfsense, with pfblocker and potentially some basic IDS/IPS packages running. However, I don’t have a good sense of:

  1. How much performance pfsense will leave unused on the Skylake box, given the usage above.
  2. How severe virtualization overhead is on these old PCs when using the default cpu exploit mitigations.
    • Pre-Spectre my understanding was that it’s usually negligiable, but testing by Phoronix shows that certain workloads (e.g. context switching synthetics, which could be relevant) are up to ~40% slower on Skylake. I wonder if this contributes to the host load average being high when pfsense is a little busy.

Does anyone have some insight, particularly into #2 that could lead me towards bare metal vs VM w/extra services on the host?