Multi-GPU Multi-Display configuration

c1742243 · June 18, 2020, 5:08am

Rather than binding a second (more powerful) GPU to vfio-pci in early early boot and using it exclusively for VFIO passthrough, I’d like to make use of it directly in the host OS.

Eventually the plan is to rebind the GPU between the host and guest as required but, for the time being, I’ll settle for getting it working as a second display/output in Xorg; preferably as an extended screen spanning all monitors.

I assume this should be straightforward enough, but despite trying various configuration options I can’t seem to get it working. I’m hoping someone can spot where I’m going wrong.

System details:

Nvidia GT 710 (primary PCIe slot), monitor #1 attached (1920x1200)
Nvidia GTX 1660 Ti (secondary PCIe slot), monitor #2 attached (1920x1080)
Nvidia proprietary driver 440.59
Ubuntu 19.10
GNOME 3.34.2

Attempt 1 - default configuration

Removing early binding to vfio-pci and allowing Xorg to automatically configure both GPUs results in a single screen on the GT 710. As far as GNOME is concerned, the second GPU/monitor doesn’t exist.

$ xrandr --listproviders
Providers: number : 1
Provider 0: id: 0x278 cap: 0x1, Source Output crtcs: 4 outputs: 3 associated providers: 0 name:NVIDIA-0

Amongst other things Xorg.0.log reports automatic configuration of a screen on the GT 710, an error creating a GPU screen (more on this later), and correct detection of the 1660 Ti

(==) NVIDIA(0): No modes were requested; the default mode "nvidia-auto-select"
(==) NVIDIA(0):     will be used as the requested mode.
(==) NVIDIA(0):
(II) NVIDIA(0): Validated MetaModes:
(II) NVIDIA(0):     "CRT-0:nvidia-auto-select"
(II) NVIDIA(0): Virtual screen size determined to be 1920 x 1200
(--) NVIDIA(0): DPI set to (93, 92); computed from "UseEdidDpi" X config
(--) NVIDIA(0):     option
[...]
(==) NVIDIA(G0): Depth 24, (==) framebuffer bpp 32
(==) NVIDIA(G0): RGB weight 888
(==) NVIDIA(G0): Default visual is TrueColor
(==) NVIDIA(G0): Using gamma correction (1.0, 1.0, 1.0)
(II) Applying OutputClass "nvidia" options to /dev/dri/card1
(**) NVIDIA(G0): Option "AllowEmptyInitialConfiguration"
(**) NVIDIA(G0): Enabling 2D acceleration
(EE) NVIDIA(G0): GPU screens are not yet supported by the NVIDIA driver
(EE) NVIDIA(G0): Failing initialization of X screen
[...]
(II) NVIDIA(1): NVIDIA GPU GeForce GTX 1660 Ti (TU116-A) at PCI:15:0:0
(II) NVIDIA(1):     (GPU-1)
(--) NVIDIA(1): Memory: 6291456 kBytes
(--) NVIDIA(1): VideoBIOS: 90.16.20.40.60
(II) NVIDIA(1): Detected PCI Express Link width: 16X

nvidia-smi reports the presence of both cards

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:0E:00.0 N/A |                  N/A |
| N/A   44C    P8    N/A /  N/A |    252MiB /  1992MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 166...  Off  | 00000000:0F:00.0 Off |                  N/A |
|  0%   42C    P8     9W / 160W |      1MiB /  5944MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

Attempt 2 - `xorg.conf` file

Use NVIDIA X Server Settings to produce an xorg.conf file.

`xorg.conf`

# nvidia-settings: X configuration file generated by nvidia-settings
# nvidia-settings:  version 440.44

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 1920 0
    Screen      1  "Screen1" LeftOf "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
EndSection

Section "Files"
EndSection

Section "Module"
    Load           "dbe"
    Load           "extmod"
    Load           "type1"
    Load           "freetype"
    Load           "glx"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    # HorizSync source: edid, VertRefresh source: edid
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "DELL 2405FPW"
    HorizSync       30.0 - 81.0
    VertRefresh     56.0 - 76.0
    Option         "DPMS"
EndSection

Section "Monitor"
    # HorizSync source: unknown, VertRefresh source: unknown
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "LG Electronics LG TV"
    HorizSync       30.0 - 83.0
    VertRefresh     58.0 - 62.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GT 710"
    BusID          "PCI:14:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce GTX 1660 Ti"
    BusID          "PCI:15:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "CRT-0"
    Option         "metamodes" "nvidia-auto-select +0+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "Stereo" "0"
    Option         "metamodes" "nvidia-auto-select +0+0 {AllowGSYNC=Off}"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Observations:

xrandr still shows the same single provider
I can move the mouse cursor onto the second monitor, but the desktop does not extend. Nor is it possible to drag application windows across.
However it is possible to run certain applications on the second monitor with the DISPLAY environment variable. E.g. DISPLAY=:1.1 glxgears
Xorg.{0,1}.log files contain no obvious errors

Attempt 3 - Xinerama

Using NVIDIA X Server Settings, enable the “Enable Xinerama” option. Which manifests as Option "Xinerama" "1" in the xorg.conf file. After restarting GDM (systemctl restart display-manager.service), the monitor that used to work now shows a black screen with a blinking cursor.

Observations

No outright errros in Xorg.0.log and confirmation that Xinerama being enabled

(**) Option "Xinerama" "1"
(==) Automatically adding devices
(==) Automatically enabling devices
(==) Automatically adding GPU devices
(==) Automatically binding GPU devices
(**) Xinerama: enabled

But the Nvidia driver complains about it

(WW) NVIDIA: The Composite and Xinerama extensions are both enabled, which
(WW) NVIDIA:     is an unsupported configuration.  The driver will continue
(WW) NVIDIA:     to load, but may behave strangely.
(WW) NVIDIA: Xinerama is enabled, so RandR has likely been disabled by the
(WW) NVIDIA:     X server.

Nvidia’s documentation suggests that the composite extension can be disabled nvidia-xconfig --no-composite, which adds this to xorg.conf

Section "Extensions"
    Option         "COMPOSITE" "Disable"
EndSection

Attempt 4 - Disable Xinerama, disable composite

Results in this lovely screen

Oh no! Something has gone wrong.
A problem has occurred and the system can’t recover. Please contact a system administrator

Attempt 5 - Enable Xinerama, disable composite

Results in a black screen with blinking cursor, same as Attempt 3

Attempt 6 - PRIME offload

If I can’t extend the desktop across multiple GPUs/displays, I’d at least be happy with offloading rendering to the more powerful GPU. If Nvidia’s documentation is to believed, this should be possible with it’s Prime Render Offload feature. The main caveat being that Xorg requires a set of patches to be applied. Something that, fortunately, has already happened in Ubuntu.

Create a minimal xorg.conf

Section "ServerLayout"
    Identifier "layout"
    Option "AllowNVIDIAGPUScreens"
EndSection

After restarting the dispaly sever, Xorg.0.log confirms creation of the screen (the same “G0” seen in logs earlier)

(==) NVIDIA(G0): Depth 24, (==) framebuffer bpp 32
(==) NVIDIA(G0): RGB weight 888
(==) NVIDIA(G0): Default visual is TrueColor
(==) NVIDIA(G0): Using gamma correction (1.0, 1.0, 1.0)
(II) Applying OutputClass "nvidia" options to /dev/dri/card1
(**) NVIDIA(G0): Option "AllowEmptyInitialConfiguration"
(**) NVIDIA(G0): Enabling 2D acceleration
(II) NVIDIA: The X server supports PRIME Render Offload.
(--) NVIDIA(0): Valid display device(s) on GPU-1 at PCI:15:0:0
(--) NVIDIA(0):     DFP-0
(--) NVIDIA(0):     DFP-1
(--) NVIDIA(0):     DFP-2
(--) NVIDIA(0):     DFP-3
(--) NVIDIA(0):     DFP-4 (boot)
(--) NVIDIA(0):     DFP-5
(--) NVIDIA(0):     DFP-6
(II) NVIDIA(G0): NVIDIA GPU GeForce GTX 1660 Ti (TU116-A) at PCI:15:0:0
(II) NVIDIA(G0):     (GPU-1)
(--) NVIDIA(G0): Memory: 6291456 kBytes
(--) NVIDIA(G0): VideoBIOS: 90.16.20.40.60
(II) NVIDIA(G0): Detected PCI Express Link width: 16X

xrandr displays two providers

Providers: number : 2
Provider 0: id: 0x3f1 cap: 0x1, Source Output crtcs: 4 outputs: 3 associated providers: 0 name:NVIDIA-0
Provider 1: id: 0x198 cap: 0x0 crtcs: 0 outputs: 0 associated providers: 0 name:NVIDIA-G0

Running vkcube (from the vulkan-tools package) with environment variables described in the documentation works, but I can’t tell if it’s actually making use of the second GPU. I.e.

$ __NV_PRIME_RENDER_OFFLOAD=1 vkcube

The second test fails

$ __NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glxinfo
name of display: :1
XIO:  fatal IO error 17 (File exists) on X server ":1"
      after 47 requests (47 known processed) with 0 events remaining.
XIO:  fatal IO error 17 (File exists) on X server ":1"
      after 47 requests (47 known processed) with 0 events remaining.

As a final test, attempting to launch Shadow of the Tomb Raider (with and without the environment variables) crashes with the error message

Vulkan device has no suitable graphics queue families

After more searching around, it seems as though others attempting this kind of configuration have discovered that, ironically, it doesn’t work with two Nvidia cards.

Any help finding a solution is greatly appreciated.

wibe · July 11, 2020, 8:37pm

Hi i just wanted you to know that i have been inspired by your passthrough post
and now are at the same point trying making my second monitor
usable for the host again.

so i am at the same point as you.
in contrary to you i boot with vfio settings as i expect doing most of my work on a vm, but in case i am working on my linux machine i want to integrate my second gpu.

i have two amd cards and assume it works as seen in this thread:

sadly i havent figured it out yet and would be glad if youd keep posting resources

EDIT:

so there is this guy. contains info about dual amd and nvidia setup.

EDIT2:

okay i managed to solve it.

disable card for vfio
reload sddm
accept the new provider as source
add the new source as monitor
have fun

its appearantly so easy man pages where enough to read.
on the first link they refer to “xrandr --listproviders” from there its cake