Installing Radeon 7nm on Arch Linux... 3-11-19--Experiment Continues

joe2gaan · February 8, 2019, 7:30pm

I will see you guys on the other side. Lol

update…
Got it running.
https://browser.geekbench.com/v4/compute/3636585

This was the geekbench score after setting the P state.

Update.

I have gone 2000 on the core and 1150 on the HBM2

https://browser.geekbench.com/v4/compute/3637218

This was with about 40 Chromium tabs open and a while loop running to monitor the GPU BTW…

2-21-19–Update:

I ordered a second Radeon 7 today for experimenting with passthrough, flashing, etc…

This is the platform that we will be working with is the Asus Z10PE-D8 workstation motherboard.
2 E5-2673v3 Xeons 32GB of Samsung ECC memory to start.

I am open to any tests the Level 1 community would like to try with the Radeon 7 and this setup. Thank your for your time.

3-11-19–update
The Experiment continues…

Oh just doing opencl things…

Goalkeeper · February 8, 2019, 7:37pm

Good luck!

Enwyn · February 9, 2019, 5:28pm

wait wait wait! Does this not mean you are in a position to find out if this thing has sr-iov?
Or do we need to see what happends on the driver front?

joe2gaan · February 9, 2019, 7:32pm

Sure. Lets test it out.

joe2gaan · February 9, 2019, 7:44pm

Whats the best way to test this? I have done gpu pass-through before on a vm but that is the extent of my experience with that sort of thing.

Enwyn · February 9, 2019, 7:57pm

i have not had my hands on a sr-iov capable card so i do not have that experience either, However i was thinking more like trying to pull infro from the pci bus:

could you try this in a terminal?
lspci | grep AMD
or:
lspci | grep Radeon

il ask in the radeon vii discussion thread for some advice aswell.

kiljacken · February 9, 2019, 7:59pm

A plain simple sudo lspci -vvv would be very interesting as a starter

joe2gaan · February 9, 2019, 8:25pm

Here is the goes for the vega20

0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20 (rev c1) (prog-if 00 [VGA controller])
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 20
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 146
NUMA node: 0
Region 0: Memory at a0000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at b0000000 (64-bit, prefetchable) [size=2M]
Region 4: I/O ports at 4000 [size=256]
Region 5: Memory at b1c00000 (32-bit, non-prefetchable) [size=512K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [64] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF Not Supported
AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn+
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00000 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [200 v1] Resizable BAR <?>
Capabilities: [270 v1] Secondary PCI Express <?>
Capabilities: [2a0 v1] Access Control Services
ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
Capabilities: [2b0 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [2c0 v1] Page Request Interface (PRI)
PRICtl: Enable- Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00000100, Page Request Allocation: 00000000
Capabilities: [2d0 v1] Process Address Space ID (PASID)
PASIDCap: Exec+ Priv+, Max PASID Width: 10
PASIDCtl: Enable- Exec- Priv-
Capabilities: [320 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Kernel driver in use: amdgpu
Kernel modules: amdgpu

kiljacken · February 9, 2019, 8:48pm

Compared to my Vega 64, the changes are minor: https://i.imgur.com/txYFDpA.png

Two things stand out to me though:

The card is still NoSoftRst+, which means that the card it self does not support standard PCI soft reset functionality. As far as I know, this could indicate that the reset bug is still a thing. But testing it by setting up a VM with gpu passthrough should show whether that is true real fast.
The card has ATSCtl enabled, and from a cursory reading about PCI this could indicate sr-iov support. I am not confident in this however, I know very little about sr-iov.

I’m currently digging around the drm code for amd gpu’s looking for an easy way to check if sr-iov is enabled/available. Meanwhile, could you send a dmesg | grep amdgpu ?

joe2gaan · February 9, 2019, 8:55pm

[ 0.000000] Command line: BOOT_IMAGE=…/vmlinuz-linux root=UUID=aaf67fbc-6ece-4d92-b203-2077eb778a9f rw amdgpu.ppfeaturemask=0xffffffff initrd=…/initramfs-linux.img
[ 0.000000] Kernel command line: BOOT_IMAGE=…/vmlinuz-linux root=UUID=aaf67fbc-6ece-4d92-b203-2077eb778a9f rw amdgpu.ppfeaturemask=0xffffffff initrd=…/initramfs-linux.img
[ 4.455432] [drm] amdgpu kernel modesetting enabled.
[ 4.456191] amdgpu 0000:0d:00.0: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 4.456192] amdgpu 0000:0d:00.0: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 4.456193] amdgpu 0000:0d:00.0: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 4.456276] [drm] amdgpu: 16368M of VRAM memory ready
[ 4.456277] [drm] amdgpu: 16368M of GTT memory ready.
[ 5.173709] fbcon: amdgpudrmfb (fb0) is primary device
[ 5.369289] amdgpu 0000:0d:00.0: fb0: amdgpudrmfb frame buffer device
[ 5.410138] amdgpu 0000:0d:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[ 5.410140] amdgpu 0000:0d:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
[ 5.410142] amdgpu 0000:0d:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
[ 5.410143] amdgpu 0000:0d:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
[ 5.410144] amdgpu 0000:0d:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
[ 5.410145] amdgpu 0000:0d:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
[ 5.410147] amdgpu 0000:0d:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
[ 5.410148] amdgpu 0000:0d:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
[ 5.410149] amdgpu 0000:0d:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
[ 5.410150] amdgpu 0000:0d:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
[ 5.410151] amdgpu 0000:0d:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[ 5.410153] amdgpu 0000:0d:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1
[ 5.410154] amdgpu 0000:0d:00.0: ring 12(uvd<0>) uses VM inv eng 6 on hub 1
[ 5.410155] amdgpu 0000:0d:00.0: ring 13(uvd_enc0<0>) uses VM inv eng 7 on hub 1
[ 5.410157] amdgpu 0000:0d:00.0: ring 14(uvd_enc1<0>) uses VM inv eng 8 on hub 1
[ 5.410159] amdgpu 0000:0d:00.0: ring 15(uvd<1>) uses VM inv eng 9 on hub 1
[ 5.410160] amdgpu 0000:0d:00.0: ring 16(uvd_enc0<1>) uses VM inv eng 10 on hub 1
[ 5.410162] amdgpu 0000:0d:00.0: ring 17(uvd_enc1<1>) uses VM inv eng 11 on hub 1
[ 5.410163] amdgpu 0000:0d:00.0: ring 18(vce0) uses VM inv eng 12 on hub 1
[ 5.410165] amdgpu 0000:0d:00.0: ring 19(vce1) uses VM inv eng 13 on hub 1
[ 5.410167] amdgpu 0000:0d:00.0: ring 20(vce2) uses VM inv eng 14 on hub 1
[ 5.864245] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:0d:00.0 on minor 0

kiljacken · February 9, 2019, 9:01pm

And perhaps a sudo ls /sys/bus/pci/devices/0000\:0d\:00.0/?

Thanks for being awesome, that’s hopefully the last one for now

EDIT: The card doesn’t expose Capabilities: [330 v1] Single Root I/O Virtualization (SR-IOV) so it does not support SR-IOV, at least not with the current vbios. A bit dissapointing, but probably what we should’ve expected.

joe2gaan · February 9, 2019, 9:27pm

[sudo] password for joe:
aer_dev_correctable boot_vga class current_link_width dma_mask_bits drm_dp_aux_dev hwmon i2c-6 iommu local_cpus msi_bus power_dpm_force_performance_level pp_dpm_pcie pp_num_states pp_table resource resource2_wc rom uevent
aer_dev_fatal broken_parity_status config d3cold_allowed driver enable i2c-10 i2c-7 iommu_group max_link_speed msi_irqs power_dpm_state pp_dpm_sclk pp_od_clk_voltage remove resource0 resource4 subsystem vbios_version
aer_dev_nonfatal cec0 consistent_dma_mask_bits device driver_override gpu_busy_percent i2c-11 i2c-8 irq max_link_width numa_node pp_cur_state pp_force_state pp_power_profile_mode rescan resource0_wc resource5 subsystem_device vendor
ari_enabled cec1 current_link_speed devspec drm graphics i2c-5 i2c-9 local_cpulist modalias power pp_dpm_mclk pp_mclk_od pp_sclk_od reset resource2 revision subsystem_vendor

No problem

joe2gaan · February 9, 2019, 9:44pm

For those that want to OC here are the ranges

OD_SCLK:
0: 808Mhz
1: 1801Mhz
OD_MCLK:
1: 1000Mhz
OD_VDDC_CURVE:
0: 808Mhz 696mV
1: 1304Mhz 796mV
2: 1801Mhz 1117mV
OD_RANGE:
SCLK: 808Mhz 2200Mhz
MCLK: 351Mhz 1200Mhz
VDDC_CURVE_SCLK[0]: 808Mhz 2200Mhz
VDDC_CURVE_VOLT[0]: 738mV 1218mV
VDDC_CURVE_SCLK[1]: 808Mhz 2200Mhz
VDDC_CURVE_VOLT[1]: 738mV 1218mV
VDDC_CURVE_SCLK[2]: 808Mhz 2200Mhz
VDDC_CURVE_VOLT[2]: 738mV 1218mV

josepr · February 9, 2019, 9:54pm

For that capability to show itself does any setting such as above sr-iov, above 4g encoding, vt-d, iommu, vt-x ,etc… have to be enable or disable in the motherboard bios?

kiljacken · February 9, 2019, 9:58pm

From what I can gather, the capabilities of a PCI device is stored in it’s configuration space, so it should be completely driver/setting independent.

joe2gaan · February 9, 2019, 11:00pm

clinfo dump

Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (2766.4)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Host timer resolution 1ns
Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx906
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0 AMD-APP (2766.4)
Driver Version 2766.4 (PAL,HSAIL)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) Unknown AMD GPU
Device Topology (AMD) PCI-E, 0d:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 60
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 1802MHz
Graphics IP (AMD) 9.6
Device Partition (core)
Max number of sub-devices 60
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs No
Round to nearest No
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 16978542592 (15.81GiB)
Global free memory (AMD) 16515072 (15.75GiB)
Global memory channels (AMD) 128
Global memory banks per channel (AMD) 4
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 4244635648 (3.953GiB)
Unified memory for Host and Device No
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 3820172032 (3.558GiB)
Preferred total size of global vars 16978542592 (15.81GiB)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 64
Max number of read/write image args 64
Max number of pipe args 16
Max active pipe reservations 16
Max pipe packet size 4244635648 (3.953GiB)
Local memory type Local
Local memory size 65536 (64KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 4244635648 (3.953GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 262144 (256KiB)
Max size 8388608 (8MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1549747285564587256ns (Sat Feb 9 16:21:25 2019)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) Yes
Number of async queues (AMD) 4
Max real-time compute queues (AMD) 0
Max real-time compute units (AMD) 0
SPIR versions 1.2
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, …) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, …) Success [AMD]
clCreateContext(NULL, …) [default] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx906
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx906
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx906

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2

aketay · February 10, 2019, 12:07am

Oh man. Any chance you can test passthrough for the vega reset bug?

joe2gaan · February 10, 2019, 12:32am

yeah I think I can throw in a 1070ti or something like that for my host os and passthrough the vega 20. It’s been a while since i havr done passthrough. Could someone please link a guide or some documentation if possible?

Enwyn · February 10, 2019, 12:35am

i used the arch guide in 2012:
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF

or the level1tech guide not for arch:

i’m not saying much becose i have nothing to contribute. but i am super excited about this thread.

aketay · February 10, 2019, 1:28am

Yep I used the arch wiki. Can’t wait to see what happens (crossing fingers they fixed it here)