PCIE GPU display [lack of] output issue on new Tyan S8050 / Gen 4 Epyc (Bergamo) build

Just received all initial parts to start testing out my new virtualization server, and have everything up and running with 2 DIMMs…

Specs (so far):

  • Tyan S8050GM4NE-2T
  • Epyc 9734 (with Dynatron J12 cooler)
  • Two 16 GB DDR5 4800 MT/s ECC RDIMMs - Samsung (from Tyan’s AVL for this board) (just for initial testing… 64 GB DIMMS are on the way…)
  • Random NVMe and SATA drives for testing
  • 1600W PSU

Everything is working fine… IPMI, remote control/KVM, and all the networking (IPMI, 1 Gb and 10 Gb)…

So far, I’ve:

  • run Ubuntu 23.04 from an existing install on an M.2 NVMe
  • installed Proxmox (from ISO on USB stick) where I’ve blacklisted amdgpu driver, and passed GPU through to Windows 10 and Debian 12 VMs…

Everything seems good, apart from being unable to get output through the discrete GPUs (and I haven’t tried any other type of PCIE cards yet, either).

I get same behavior will all the GPUs I’ve tried - see list below.
On Ubuntu 23.04, kernel 6.2, bare metal:

  • can view Ubuntu encryption and splash screen through GPU
  • display goes black before display manager (gdm3) loads (but it stil visible via BMC VGA and KVM)
  • display settings show two displays - #1 is the GPU, but settings are broken, #2 is the ASpeed VGA… (see image at bottom of post)
  • amdgpu driver seems to load fine (looking at dmesg and lspci)
  • lspci lists / recognises the card
  • Vulkaninfo recognises the card

On Windows VM, windows loads fine, and I can get image via framebuffer (through display port), but AMD software drivers will not install/recognize car, so stuck at 800x600 with no hardware acceleration.

GPUs I’ve tried so far:

  • AMD RDNA2 - Radeon RX 6900 XT
  • AMD RNDA - 5500 Pro
  • AMD GCN - RX 550 4GB
  • Intel Arc A700 16 GB

I’ve gone through the BIOS [quickly] and can’t see anything obviously problematic (all slots are set to auto for PCIE generation, and x16), but will go back though the Tyan manual and all the BIOS options in detail tomorrow. I’ll also try with an NVMe drive with Fedora 38 and kernel 6.5 RCx (once I remove it from the computer I’m typing this on…).

I can see - via lscpu - that the 9734 has 4 NUMA nodes… could that be an issue?
I only have two DIMMs - in slots A and G per the AMD and Tyan guidelines- could that be an issue?

I’m able to get everything above working correctly with my current Proxmox setups, but all my previous experience is with Ryzen (Zen 1 through Zen 4), and I’m new to Tyan motherboards, and Epyc CPUs, so maybe I missing something obvious?

Has anyone seen anything similar, and know how to address it?

Details (from Ubuntu 23.04):

lspci output:

c3:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0)
        Subsystem: Tul Corporation / PowerColor Red Devil AMD Radeon RX 6900 XT [148c:2408]
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu
c3:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel

lscpu:

NUMA:
  NUMA node(s):          4
  NUMA node0 CPU(s):     0-27,112-139
  NUMA node1 CPU(s):     28-55,140-167
  NUMA node2 CPU(s):     56-83,168-195
  NUMA node3 CPU(s):     84-111,196-223

dmesg | grep amdgpu output:

[    8.154141] [drm] amdgpu kernel modesetting enabled.
[    8.154232] amdgpu: CRAT table not found
[    8.154236] amdgpu: Virtual CRAT table created for CPU
[    8.154271] amdgpu: Topology: Add CPU node
[    8.154426] amdgpu 0000:c3:00.0: enabling device (0000 -> 0003)
[    8.190987] amdgpu 0000:c3:00.0: amdgpu: Fetched VBIOS from ROM BAR
[    8.190990] amdgpu: ATOM BIOS: 113-D41201-XT
[    8.191009] amdgpu 0000:c3:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[    8.191053] amdgpu 0000:c3:00.0: amdgpu: MEM ECC is not presented.
[    8.191054] amdgpu 0000:c3:00.0: amdgpu: SRAM ECC is not presented.
[    8.191080] amdgpu 0000:c3:00.0: BAR 2: releasing [mem 0x10bef0000000-0x10bef01fffff 64bit pref]
[    8.191082] amdgpu 0000:c3:00.0: BAR 0: releasing [mem 0x10bee0000000-0x10beefffffff 64bit pref]
[    8.191110] amdgpu 0000:c3:00.0: BAR 0: assigned [mem 0x10400000000-0x107ffffffff 64bit pref]
[    8.191119] amdgpu 0000:c3:00.0: BAR 2: assigned [mem 0x10200000000-0x102001fffff 64bit pref]
[    8.191178] amdgpu 0000:c3:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[    8.191180] amdgpu 0000:c3:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    8.191181] amdgpu 0000:c3:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[    8.191245] [drm] amdgpu: 16368M of VRAM memory ready
[    8.191246] [drm] amdgpu: 15922M of GTT memory ready.
[    8.191682] amdgpu 0000:c3:00.0: amdgpu: PSP runtime database doesn't exist
[    8.191685] amdgpu 0000:c3:00.0: amdgpu: PSP runtime database doesn't exist
[   10.047100] amdgpu 0000:c3:00.0: amdgpu: STB initialized to 2048 entries
[   10.048081] amdgpu 0000:c3:00.0: amdgpu: Will use PSP to load VCN firmware
[   10.248869] amdgpu 0000:c3:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[   10.248894] amdgpu 0000:c3:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5600 (58.86.0)
[   10.248897] amdgpu 0000:c3:00.0: amdgpu: SMU driver if version not matched
[   10.248927] amdgpu 0000:c3:00.0: amdgpu: use vbios provided pptable
[   10.323042] amdgpu 0000:c3:00.0: amdgpu: SMU is initialized successfully!
[   10.403229] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[   10.403423] amdgpu: sdma_bitmap: ffff
[   10.456241] amdgpu: HMM registered 16368MB device memory
[   10.456597] amdgpu: Virtual CRAT table created for GPU
[   10.456831] amdgpu: Topology: Add dGPU node [0x73bf:0x1002]
[   10.456834] kfd kfd: amdgpu: added device 1002:73bf
[   10.456860] amdgpu 0000:c3:00.0: amdgpu: SE 4, SH per SE 2, CU per SH 10, active_cu_number 80
[   10.456935] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[   10.456936] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[   10.456937] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[   10.456938] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[   10.456939] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[   10.456940] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[   10.456940] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[   10.456941] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[   10.456942] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[   10.456943] amdgpu 0000:c3:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[   10.456943] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[   10.456944] amdgpu 0000:c3:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[   10.456945] amdgpu 0000:c3:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
[   10.456946] amdgpu 0000:c3:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
[   10.456946] amdgpu 0000:c3:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[   10.456947] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[   10.456948] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[   10.456949] amdgpu 0000:c3:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
[   10.456950] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
[   10.456950] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
[   10.456951] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
[   10.458107] amdgpu 0000:c3:00.0: amdgpu: Using BACO for runtime pm
[   10.458599] [drm] Initialized amdgpu 3.49.0 20150101 for 0000:c3:00.0 on minor 1
[   10.467307] amdgpu 0000:c3:00.0: [drm] fb1: amdgpudrmfb frame buffer device
[   30.646281] snd_hda_intel 0000:c3:00.1: bound 0000:c3:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

Display settings: #2 is the ASpeed VGA…


unable to turn on the display or change any settings (it errors out):

Hi

I guess all the video output is going to the aspeed video driver/VGA connector.

On my system (also has aspeed/IPMI) I had to configure X to use my dedicated GPU:

rgysi@zephir:~$ ls /etc/X11/xorg.conf.d/
rtx-a5000.conf
rgysi@zephir:~$ cat /etc/X11/xorg.conf.d/rtx-a5000.conf
Section “Device”
Identifier “RTXA5000”
Driver “nvidia”
VendorName “NVIDIA Corporation”
BusID “PCI:4:00:0”
EndSection

Section “Screen”
Identifier “Screen0”
Device “RTXA5000”
Monitor “Monitor0”
DefaultDepth 24
SubSection “Display”
Depth 24
EndSubSection
EndSection
rgysi@zephir:~$

Just found this link, which is an interesting read, but I haven’t tried it out yet: https://wiki.archlinux.org/title/PRIME

Thanks for the reply. I haven’t had chance to take a look at this in detail or try it yet, but will do later today.

( I tried Fedora 38 (kernel 6.5RC2) earlier today (no changes from my Ryzen 7950x desktop SSD) and it made no difference; i.e. same issue.
Also tried changing the slots to be PCIE4 (in BIOS), but that made no difference either. )

Hopefully what you suggested will make a difference, at least on bare-metal linux…
The thing I find strange is that I have the same behavior inside of VMs (hosted on Proxmox 7.4) - both Windows and Linux guests, where all I pass-through is the GPU, so the ASpeed VGA shouldn’t have an impact there.

Anyways, as well as trying what you’ve suggested I’m going to try a few other things:

  • load it with other PCIE cards (WiFi, NVME, and SATA adapters) and see if they work correctly.
  • upgrade Proxmox to V8.x, to see if the 6.2 kernel (versus 5.15) makes any difference.
  • try installing Windows 11 on bare-metal (to NVMe, from an image on a USB) to see if that works.

Main reason I bought this was to learn anyway and to be the basis of my future homelab server, but I have no deadlines (it’s just for me), so this is all part of the fun (or at least that’s what I’m telling myself :slight_smile: ).

@wendell I saw your video last night - AMD’s core vision (really good, BTW) - and that you were running Fedora 38 on Bergamo with video out… was that BMC VGA, or did you have a GPU installed?
Any thoughts on what could be causing my issues, especially inside VMs? (I’m totally open to, and wouldn’t be surprised, if it’s down to my lack of experience and missing something obvious).

Lspci -vvvnn

Disable resize bar in bios
Enable pcie aer
Enable pcie aspm

Maybe worth enabling x2apic

1 Like

Thanks for the help :+1: .

  • Don’t have option (or can’t find it) for resize bar in BIOS (or in the Tyan motherboard/BIOS manual)
  • “NBIO->Enable AER Cap” is set to Auto, and “NBIO->ACS Enable” to “Disabled”
  • Update: I set “NBIO->Enable AER Cap” is set to “Enabled” (default was “Auto”), and left “NBIO->ACS Enable” set to “Disabled”
  • Can’t find aspm setting in BIOS (or in the manual)
  • I enabled x2apic

tldr; everything I wanted to work is functional :slight_smile:
with the exception of GPU pass-through - for accelerated display output - in Linux VMs

===========================================================
I spent a few hours last night and this morning testing (after installing motherboard into a case, as I’d been testing on the motherboard box prior to that … ).

I wasn’t very scientific in my approach, as I simultaneously changed way too many variables at times, but here’s details with everything now functioning correctly. :slight_smile:

I had to switch to different GPUs (for compatibility with the 4U case I’m using), so everything below is with a Radeon VII and/or Radeon Pro W5500,

Linux on Bare-metal with discrete GPU (SOLVED):

  • this fix was simply down to selecting “Primary Display” as “external” in BIOS (instead of the default “Onboard”. I should have figured that out myself sooner :flushed:
  • Status: Ubuntu 23.04 runs with full desktop acceleration with both GPUs (independent runs), with functioning OpenGL Vulkan, etc… (after setting Primary Display to external in BIOS)
  • I’m not going to ever use it in the scenario, but was my baseline for confirming GPU functionality…

Windows 10 guest VM - host Proxmox 7 and 8 [SOLVED]:

  • enabled x2apic in BIOS
  • ensured I had all Proxmox settings correct per the documentation (including iommu=“pt” in PVE kernel boot arguments)
  • VM: - q35 / UEFI (OVMF)
    - PCI Device “All functions” enabled, “ROM-Bar” enabled, “Primary GPU” and “PCI-Express” enabled.
  • tested Proxmox 7.4.x and 8.0.x
    Status: works perfectly… able to run Steam and 3DMark with all latest benchmarks on a fully updated Windows 10 and up-to-date AMD Adrenalin drivers.

ROCm in Ubutnu 22.04 LTS (headless) VM - (SOLVED)

  • installed ROCm and the amdgpu-dkms driver
  • tested with Proxmox 7.4.x and 8.0.x
  • seems to work perfectly with both GPUs
  • VM: - i440fx / SeaBIOS ;
    - PCI Device “All functions” enabled, “ROM-Bar” enabled, “Primary GPU” and “PCI-Express” NOT enabled.
    e.g.:
$ rocm-smi
========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU  Temp (DieEdge)  AvgPwr  SCLK  MCLK    Fan     Perf  PwrCap  VRAM%  GPU%
0    33.0c           3.0W    0Mhz  100Mhz  24.71%  auto  105.0W    1%   0%
====================================================================================
=============================== End of ROCm SMI Log ================================

$ rocminfo
ROCk module is loaded
Failed to parse CPUID
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
<snip>
*******
Agent 2
*******
  Name:                    gfx1012
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon Pro W5500
  Vendor Name:             AMD

$ /opt/rocm/opencl/bin/clinfo
Failed to parse CPUID
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.1 AMD-APP (3581.0)
  Platform Name:                                 AMD Accelerated Parallel Processing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_amd_event_callback


  Platform Name:                                 AMD Accelerated Parallel Processing
Number of devices:                               1
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    AMD Radeon Pro W5500

GPU display-out (with acceleration) - Ubuntu 22.04 LTS VM - (SOLVED)

  • was fixed by enabling “PCIe AER Cap” in BIOS
  • Status: full Vulkan and OpenGL acceleration
1 Like

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.