Linux performance of the NVidia 1650Ti on the Zephyrus G14

I’ve got Ubuntu 20.04 installed on an Asus Zephyrus G14 which has an AMD Ryzen 5 4600HS and an Nvidia 1650 TI.

While I have somewhat randomly messed around with the drivers and configs a couple of times (usually once every few weeks when I find time), the latest changes I made are based loosely on

Up until now, I’ve basically ignored the nvidia and run off of Renoir entirely. IIRC, I was able to find a setting which forced the primary GPU to amdgpu, and the nvidia was siphoning off about 15-20 watts in the background (this is a rough estimate using powertop, and is borne out by the system exhaust being pretty warm even at idle). I remember the approach also made it possible to set Nvidia as the primary, and the whole desktop would then run off of the Nvidia GPU. However, the approach required a full restart to switch GPUs. I do not remember why I did not keep the default as Nvidia - it either had to do with continuous power drain or an issue with resolution.

Using this new page that I came across, I have now been able to enable the use of the discrete GPU per application.

Specifically, the following items may be of interest :

  • Kernel : 5.8.8-050808-generic
  • Nvidia driver : xserver-xorg-video-nvidia-450 /focal,now 450.66-0ubuntu0.20.04.1
  • I have not set modeset=0
  • I use the prime-run alias to run something with the GPU.
alias prime-run="__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia __VK_LAYER_NV_optimus=NVIDIA_only"

Strange Performance Results

So it turns out that the Renoir delivers 2x the performance of the 1650TI. This is very strange to me.

With AMD Renoir :

$ glxinfo | grep vendor
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI 
OpenGL vendor string: X.Org

$ vblank_mode=0 __GL_SYNC_TO_VBLANK=0 glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
62346 frames in 5.0 seconds = 12469.200 FPS
64965 frames in 5.0 seconds = 12992.678 FPS
66598 frames in 5.0 seconds = 13319.348 FPS
64976 frames in 5.0 seconds = 12995.117 FPS

$ glmark2
    glmark2 2014.03+git20150611.fa71af2d
    OpenGL Information
    GL_VENDOR:     X.Org
    GL_RENDERER:   AMD RENOIR (DRM 3.38.0, 5.8.8-050808-generic, LLVM 10.0.0)
    GL_VERSION:    4.6 (Compatibility Profile) Mesa 20.0.8
[build] use-vbo=false: FPS: 3838 FrameTime: 0.261 ms
[build] use-vbo=true: FPS: 4617 FrameTime: 0.217 ms
[texture] texture-filter=nearest: FPS: 3717 FrameTime: 0.269 ms
[texture] texture-filter=linear: FPS: 3823 FrameTime: 0.262 ms
[texture] texture-filter=mipmap: FPS: 3635 FrameTime: 0.275 ms
[shading] shading=gouraud: FPS: 3391 FrameTime: 0.295 ms
[shading] shading=blinn-phong-inf: FPS: 3208 FrameTime: 0.312 ms
[shading] shading=phong: FPS: 3177 FrameTime: 0.315 ms
[shading] shading=cel: FPS: 3111 FrameTime: 0.321 ms
[bump] bump-render=high-poly: FPS: 2380 FrameTime: 0.420 ms
[bump] bump-render=normals: FPS: 4684 FrameTime: 0.213 ms
[bump] bump-render=height: FPS: 4761 FrameTime: 0.210 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 3529 FrameTime: 0.283 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 2156 FrameTime: 0.464 ms
[pulsar] light=false:quads=5:texture=false: FPS: 3820 FrameTime: 0.262 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 2152 FrameTime: 0.465 ms
[desktop] effect=shadow:windows=4: FPS: 3378 FrameTime: 0.296 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 971 FrameTime: 1.030 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 1184 FrameTime: 0.845 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1013 FrameTime: 0.987 ms
[ideas] speed=duration: FPS: 3204 FrameTime: 0.312 ms
[jellyfish] <default>: FPS: 2544 FrameTime: 0.393 ms
[terrain] <default>: FPS: 263 FrameTime: 3.802 ms
[shadow] <default>: FPS: 2742 FrameTime: 0.365 ms
[refract] <default>: FPS: 416 FrameTime: 2.404 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 3810 FrameTime: 0.262 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 3564 FrameTime: 0.281 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 3819 FrameTime: 0.262 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 3634 FrameTime: 0.275 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 3624 FrameTime: 0.276 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 3583 FrameTime: 0.279 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 3636 FrameTime: 0.275 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 3586 FrameTime: 0.279 ms
                                  glmark2 Score: 3059 

And with Nvidia:

$ prime-run glxinfo | grep vendor
server glx vendor string: NVIDIA Corporation
client glx vendor string: NVIDIA Corporation 
OpenGL vendor string: NVIDIA Corporation

$ prime-run vblank_mode=0 __GL_SYNC_TO_VBLANK=0 glxgears
26757 frames in 5.0 seconds = 5351.372 FPS
23598 frames in 5.0 seconds = 4719.587 FPS 
24467 frames in 5.0 seconds = 4893.132 FPS
25864 frames in 5.0 seconds = 5172.789 FPS

$ prime-run glmark2                           
    glmark2 2014.03+git20150611.fa71af2d
    OpenGL Information
    GL_VENDOR:     NVIDIA Corporation
    GL_RENDERER:   GeForce GTX 1650 Ti/PCIe/SSE2
    GL_VERSION:    4.6.0 NVIDIA 450.66
[build] use-vbo=false: FPS: 1734 FrameTime: 0.577 ms
[build] use-vbo=true: FPS: 2191 FrameTime: 0.456 ms
[texture] texture-filter=nearest: FPS: 2146 FrameTime: 0.466 ms
[texture] texture-filter=linear: FPS: 2030 FrameTime: 0.493 ms
[texture] texture-filter=mipmap: FPS: 2100 FrameTime: 0.476 ms
[shading] shading=gouraud: FPS: 2084 FrameTime: 0.480 ms
[shading] shading=blinn-phong-inf: FPS: 2162 FrameTime: 0.463 ms
[shading] shading=phong: FPS: 2093 FrameTime: 0.478 ms
[shading] shading=cel: FPS: 2055 FrameTime: 0.487 ms
[bump] bump-render=high-poly: FPS: 2099 FrameTime: 0.476 ms
[bump] bump-render=normals: FPS: 2189 FrameTime: 0.457 ms
[bump] bump-render=height: FPS: 2194 FrameTime: 0.456 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 2058 FrameTime: 0.486 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 1931 FrameTime: 0.518 ms
[pulsar] light=false:quads=5:texture=false: FPS: 2142 FrameTime: 0.467 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 1526 FrameTime: 0.655 ms
[desktop] effect=shadow:windows=4: FPS: 1686 FrameTime: 0.593 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 941 FrameTime: 1.063 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 1012 FrameTime: 0.988 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 946 FrameTime: 1.057 ms
[ideas] speed=duration: FPS: 1813 FrameTime: 0.552 ms
[jellyfish] <default>: FPS: 1828 FrameTime: 0.547 ms
[terrain] <default>: FPS: 757 FrameTime: 1.321 ms
[shadow] <default>: FPS: 1800 FrameTime: 0.556 ms
[refract] <default>: FPS: 1362 FrameTime: 0.734 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 2089 FrameTime: 0.479 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 2129 FrameTime: 0.470 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 2143 FrameTime: 0.467 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 2102 FrameTime: 0.476 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 2157 FrameTime: 0.464 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2125 FrameTime: 0.471 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2135 FrameTime: 0.468 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2167 FrameTime: 0.461 ms
                                  glmark2 Score: 1876 

I have checked that nvidia-settings shows that it switches to Performance Level 3 in the power mixer when glmark2 is being run, so the GPU is being used and isn’t stuck at some low clock. It seems to generally be at 1890 MHz when glmark2 is running in the background, with the maximum for the performance level being 2100 MHz.

Any insight into why this might be happening or how to fix it would be appreciated.

I would like to rerun the test with nvidia set as the default GPU, but can’t quite seem to remember how / where I configured that. If I find it, I will provide the results for that configuration as well. That said, I would still not expect this sort of GPU offload to explain this kind of performance discrepancy.

Side notes :

  • Simple hello-world cuda code compiles and runs fine with or without prime-run
  • Simple hello-world OpenCL code compiles and runs fine on the nvidia with or without prime-run. I’m pretty sure it’s not able to see the renoir, because I gave up trying to find and install the AMD SDK.
  • Upgrade to Linux 5.8.8 from Linux 5.8.1 seems to have reduced idle power consumption of the GPU to ~7 watts from the ~15 watts it was before.