VFIO success, benched against baremetal

Hello Level1Techs,

I’m happy to report successful GPU passthrough with some benchmark scores. While I already had some VM experience, I only heard about GPU passthrough 2 years ago and tried it for the first time last week for my new rig. It was an amazing experience and I’m satisfied by the results as well as the learning process :smiley:

Most relevant HW

Gigabyte Z390 Aorus Elite
Intel Core i9 9900K
Zotac GeForce RTX 2080 SUPER Twin Fan
MSI GTX 1050 TI

My intention was to put an actual number on performance loss due to virtualization and learn as much as I can. i9 9900k has 16 threads, vm uses 14 threads (no isolcpus). I did some 14t win10 baremetal benchmark to have two points of comparison. Also evaluated benchmark perf with LookingGlass.

“Dual boot” use case loss (ie. 14 threads VM vs 16 threads baremetal)
LG : ~6%
HDMI : no diff on avg, barely more than 4% diff on 3DMark Firestrike

Needless to say, I’m impressed.

LookingGlass vs 16t "dual boot"

6622/7104 = 0.932 superposition
3431/3655 = 0.939 heaven
5567/5657 = 0.984 valley
10227/10858 = 0.942 timespy
21903/24041 = 0.911 firestrike
0.9412 avg

HDMI vs 16t "dual boot"

7019/7104 = 0.988 superposition
3651/3655 = 0.999 heaven
5917/5657 = 1.046 valley oO
10986/10858 = 1.011 timespy oO
23031/24041 = 0.958 firestrike
1.0004 avg oO

LookingGlass vs 14t "fair"

6622/6976 = 0.949 superposition
3431/3607 = 0.951 heaven
5567/5556 = 1.002 valley oO
10227/9252 = 1.105 timespy oO
21903/22589 = 0.970 firestrike
0.995 avg

"HDMI vs 14t "fair" (is it ?)

7019/6976 = 1.006 superposition
3651/3607 = 1.012 heaven
5917/5556 = 1.065 valley
10986/9252 = 1.187 timespy
23031/22589 = 1.019 firestrike
1.057 avg oO

A long and messy post with more details and some scripts there :

I wrote the reddit post after some basic gaming tests and benches and was kindly reminded to have a look at latency there. Worked it out so that Latencymon now stays in the green with spikes around 400µs “pretty much whatever I throw at it”. With the notable exception of evdev passthrough which often (but not always) throws a ~90ms spike when I switch but that’s not an issue for me. I can trigger it forcefully by hitting both Ctrl keys like a madman :laughing:

3 Likes

Added the link for you

1 Like

Do you mean threads?

You had me shook. I thought the 9900k was really a 16 core cpu. I was like “bro. I’m living under a rock over here with my 7200u” :joy::joy::joy:

1 Like

Whoops, yes indeed I do mean 16 threads :stuck_out_tongue:

And now that there’s a link in it I can’t edit it :laughing:
EDIT: fixed, still learning :slight_smile:

2 Likes

You should be able to now.

2 Likes

Very neat!

I’ve been playing and fiddling with VFIO for about 2 weeks now. My wife was about to leave. :laughing:

The last setup I had was on Xubuntu 19.04, but with terrible latency spikes and really bad performance over all. Now I have settled on Pop! OS 19.04, which gives me the best performance and responsiveness so far. :blush:

But I am still tweaking the rig.

At the moment I utilize all 12 threads of my Ryzen 5 2600X for the guest, and about 2/3rds of my memory. I should compare 12 threads CPU setup performance to 10 threads I guess, and leave some juice for the host. :smile:


///EDIT

Oh, btw., these hyperv settings gave me a great performance boost on the guest.

<hyperv>
  <relaxed state='on'/>
  <vapic state='on'/>
  <spinlocks state='on' retries='8191'/>
  <vpindex state='on'/>

  <synic state='on'/>
  <stimer state='on'/>

  <vendor_id state='on' value='whatever'/>
</hyperv>
1 Like

It seems synic and stimer both depend on vpindex which isn’t supported on host side for me.

And besides vendor_id, all other values where default or from Arch Wiki.

Latencymon on guest is ok with 12 threads ? I mean if there’s no defect it’s not really an issue I guess.

LatencyMon is floating along at about 20 - 200 us in idle.

Sometimes there are spikes to 3000 - 8000 under load, but just once in a while.

I tried a few benchmarks now. But the cost for cutting down 2 threads is too large in my case. :smile:

Ehh, you’d be surprised :smiley: My 14t vm pretty much match up with win10 on 16t. I did gain bench scores through settings I done regarding latency improvements. Around 900 points on firestrike. What can you do on 6 cores that can’t be done on 5 :smiley: ? Anyway, the point with latency is to improve image perception and audio cracking, more or less. If you don’t have those issue you might as well get away with it. But they might come back with heavy load.

As far as I understand it, latency is more a byproduct of the whole setup than anything else so testing should also imply network and ‘disk’ IO, not only GPU/CPU. Some bench are very light on CPU. The other tool that came with latencymon also produced scary numbers very fast before addressing latency.

If you want to tackle it it seems one of the first steps is CPU pinning, for that you need to check with lstopo how your thread are mapped. But I’m way to much of a noob on the topic to advise you properly, especially with an AMD cpu since I did it on i9.

I tried to sum up my latency modifications there

In your case, maybe try cpu-pm=on first since it seems AMD/Intel agnostic.

After a Rainbow 6 Siege benchmark & game through looking glass ~ 20 min


Since adding cpu-pm=on, I’ve never seen DPC that high, during testing it capped at 450. I forgot to test random access IO though.

1 Like

Nice. I’ll test cpu-pm this evening. :slight_smile:

Also, that post was linked on reddit and somewhat demonstrates that some CPU pinning tend to be harmful for latency.