Just received all initial parts to start testing out my new virtualization server, and have everything up and running with 2 DIMMs…
Specs (so far):
- Tyan S8050GM4NE-2T
- Epyc 9734 (with Dynatron J12 cooler)
- Two 16 GB DDR5 4800 MT/s ECC RDIMMs - Samsung (from Tyan’s AVL for this board) (just for initial testing… 64 GB DIMMS are on the way…)
- Random NVMe and SATA drives for testing
- 1600W PSU
Everything is working fine… IPMI, remote control/KVM, and all the networking (IPMI, 1 Gb and 10 Gb)…
So far, I’ve:
- run Ubuntu 23.04 from an existing install on an M.2 NVMe
- installed Proxmox (from ISO on USB stick) where I’ve blacklisted amdgpu driver, and passed GPU through to Windows 10 and Debian 12 VMs…
Everything seems good, apart from being unable to get output through the discrete GPUs (and I haven’t tried any other type of PCIE cards yet, either).
I get same behavior will all the GPUs I’ve tried - see list below.
On Ubuntu 23.04, kernel 6.2, bare metal:
- can view Ubuntu encryption and splash screen through GPU
- display goes black before display manager (gdm3) loads (but it stil visible via BMC VGA and KVM)
- display settings show two displays - #1 is the GPU, but settings are broken, #2 is the ASpeed VGA… (see image at bottom of post)
- amdgpu driver seems to load fine (looking at dmesg and lspci)
- lspci lists / recognises the card
- Vulkaninfo recognises the card
On Windows VM, windows loads fine, and I can get image via framebuffer (through display port), but AMD software drivers will not install/recognize car, so stuck at 800x600 with no hardware acceleration.
GPUs I’ve tried so far:
- AMD RDNA2 - Radeon RX 6900 XT
- AMD RNDA - 5500 Pro
- AMD GCN - RX 550 4GB
- Intel Arc A700 16 GB
I’ve gone through the BIOS [quickly] and can’t see anything obviously problematic (all slots are set to auto for PCIE generation, and x16), but will go back though the Tyan manual and all the BIOS options in detail tomorrow. I’ll also try with an NVMe drive with Fedora 38 and kernel 6.5 RCx (once I remove it from the computer I’m typing this on…).
I can see - via lscpu - that the 9734 has 4 NUMA nodes… could that be an issue?
I only have two DIMMs - in slots A and G per the AMD and Tyan guidelines- could that be an issue?
I’m able to get everything above working correctly with my current Proxmox setups, but all my previous experience is with Ryzen (Zen 1 through Zen 4), and I’m new to Tyan motherboards, and Epyc CPUs, so maybe I missing something obvious?
Has anyone seen anything similar, and know how to address it?
Details (from Ubuntu 23.04):
lspci output:
c3:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0)
Subsystem: Tul Corporation / PowerColor Red Devil AMD Radeon RX 6900 XT [148c:2408]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
c3:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
lscpu:
NUMA:
NUMA node(s): 4
NUMA node0 CPU(s): 0-27,112-139
NUMA node1 CPU(s): 28-55,140-167
NUMA node2 CPU(s): 56-83,168-195
NUMA node3 CPU(s): 84-111,196-223
dmesg | grep amdgpu output:
[ 8.154141] [drm] amdgpu kernel modesetting enabled.
[ 8.154232] amdgpu: CRAT table not found
[ 8.154236] amdgpu: Virtual CRAT table created for CPU
[ 8.154271] amdgpu: Topology: Add CPU node
[ 8.154426] amdgpu 0000:c3:00.0: enabling device (0000 -> 0003)
[ 8.190987] amdgpu 0000:c3:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 8.190990] amdgpu: ATOM BIOS: 113-D41201-XT
[ 8.191009] amdgpu 0000:c3:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 8.191053] amdgpu 0000:c3:00.0: amdgpu: MEM ECC is not presented.
[ 8.191054] amdgpu 0000:c3:00.0: amdgpu: SRAM ECC is not presented.
[ 8.191080] amdgpu 0000:c3:00.0: BAR 2: releasing [mem 0x10bef0000000-0x10bef01fffff 64bit pref]
[ 8.191082] amdgpu 0000:c3:00.0: BAR 0: releasing [mem 0x10bee0000000-0x10beefffffff 64bit pref]
[ 8.191110] amdgpu 0000:c3:00.0: BAR 0: assigned [mem 0x10400000000-0x107ffffffff 64bit pref]
[ 8.191119] amdgpu 0000:c3:00.0: BAR 2: assigned [mem 0x10200000000-0x102001fffff 64bit pref]
[ 8.191178] amdgpu 0000:c3:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 8.191180] amdgpu 0000:c3:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 8.191181] amdgpu 0000:c3:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 8.191245] [drm] amdgpu: 16368M of VRAM memory ready
[ 8.191246] [drm] amdgpu: 15922M of GTT memory ready.
[ 8.191682] amdgpu 0000:c3:00.0: amdgpu: PSP runtime database doesn't exist
[ 8.191685] amdgpu 0000:c3:00.0: amdgpu: PSP runtime database doesn't exist
[ 10.047100] amdgpu 0000:c3:00.0: amdgpu: STB initialized to 2048 entries
[ 10.048081] amdgpu 0000:c3:00.0: amdgpu: Will use PSP to load VCN firmware
[ 10.248869] amdgpu 0000:c3:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 10.248894] amdgpu 0000:c3:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5600 (58.86.0)
[ 10.248897] amdgpu 0000:c3:00.0: amdgpu: SMU driver if version not matched
[ 10.248927] amdgpu 0000:c3:00.0: amdgpu: use vbios provided pptable
[ 10.323042] amdgpu 0000:c3:00.0: amdgpu: SMU is initialized successfully!
[ 10.403229] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 10.403423] amdgpu: sdma_bitmap: ffff
[ 10.456241] amdgpu: HMM registered 16368MB device memory
[ 10.456597] amdgpu: Virtual CRAT table created for GPU
[ 10.456831] amdgpu: Topology: Add dGPU node [0x73bf:0x1002]
[ 10.456834] kfd kfd: amdgpu: added device 1002:73bf
[ 10.456860] amdgpu 0000:c3:00.0: amdgpu: SE 4, SH per SE 2, CU per SH 10, active_cu_number 80
[ 10.456935] amdgpu 0000:c3:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 10.456936] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 10.456937] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 10.456938] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 10.456939] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 10.456940] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 10.456940] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 10.456941] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 10.456942] amdgpu 0000:c3:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 10.456943] amdgpu 0000:c3:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 10.456943] amdgpu 0000:c3:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 10.456944] amdgpu 0000:c3:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 10.456945] amdgpu 0000:c3:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
[ 10.456946] amdgpu 0000:c3:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
[ 10.456946] amdgpu 0000:c3:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
[ 10.456947] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
[ 10.456948] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
[ 10.456949] amdgpu 0000:c3:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 1
[ 10.456950] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 1
[ 10.456950] amdgpu 0000:c3:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 1
[ 10.456951] amdgpu 0000:c3:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 1
[ 10.458107] amdgpu 0000:c3:00.0: amdgpu: Using BACO for runtime pm
[ 10.458599] [drm] Initialized amdgpu 3.49.0 20150101 for 0000:c3:00.0 on minor 1
[ 10.467307] amdgpu 0000:c3:00.0: [drm] fb1: amdgpudrmfb frame buffer device
[ 30.646281] snd_hda_intel 0000:c3:00.1: bound 0000:c3:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Display settings: #2 is the ASpeed VGA…
unable to turn on the display or change any settings (it errors out):