sami_m
March 24, 2024, 10:16am
1
Can you run AMD ROCm driver stack inside a container when host machine is not running AMD closed drivers but the default open source + non free binary (Debian 12 on old AMD 5600XT GPU)?
Backstory is that; I’m trying to get into machine learning and would like to run GPU accelerated stuff and I would also like to keep my system clean from AMD driver stack since everything is working so well and stable on default install.
How can you provide container access to a gpu that isnt functioning on the host…?
sami_m
March 24, 2024, 2:50pm
3
I’m not very familiar with containers. I have default drivers (open) running and they provide /dev/kfd and /dev/dri/renderD128 on host. Which at least are needed if I understand this. But as said this is all new to me. Does the host itself need the ROCm? Or is it enough that driver files are accessible on host?
Sorry, I read your post before my coffee. It should be working if you have amdgpu installed on the host.
This might help…
opened 01:19AM - 24 Jun 22 UTC
closed 10:10AM - 19 Dec 23 UTC
I've read this thread and it seems some users I've managed to make it work for 5… 700 which is navi10 too. https://github.com/RadeonOpenCompute/ROCm/issues/1306
What is the status for running ROCm for Radeon 5600XT or other Navi10 GPUs? Any tips?
I've compiled ROCm without error on my config and trying to make pytorch work (for anyone curious I'm looking at running [this](https://github.com/lowfuel/progrockdiffusion))
The thing is it seems the GPU is not detected/accessible
Here is the output of `rocminfo`:
```
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 5 3600X 6-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 3600X 6-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3800
BDFID: 0
Internal Node ID: 0
Compute Unit: 12
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16307888(0xf8d6b0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16307888(0xf8d6b0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16307888(0xf8d6b0) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1010
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 5600 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 4096(0x1000) KB
Chip ID: 29471(0x731f)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1780
BDFID: 2816
Internal Node ID: 1
Compute Unit: 36
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 1280(0x500)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 6275072(0x5fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1010:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
```
When running my script with `DEVICE = torch.device('cuda')`, it returns an error about not finding the GPU : `RuntimeError: No CUDA GPUs are available`
And `sudo python3 -c 'import torch; print(torch.cuda.is_available())'` returns false.
Not sure what direction I should investigate.... if I even should or if the GPU is simply impossible to make it work.
I've read some people [here](https://github.com/RadeonOpenCompute/ROCm/issues/887) or [here](https://rigtorp.se/notes/rocm/) talking about managing to make the 5700 work, I'm not sure i'll have the skill to do it with their info only.
* python version: 3.7.13
* ROCm version: 5.1.3-1 (through ROCm-arch)
* OS: Manjaro 21.3.0
* kernel: Kernel: x86_64 Linux 5.15.48-1-MANJARO
* CPU: AMD Ryzen 5 3600X 6-Core @ 12x 3.8GHz
* GPU: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev ca)
* i'm using conda 4.13.0
* and the following PyTorch related packages : torch 0.10.2+cu113, torchaudio 0.10.2+cu113, torchmetrics 0.9.1, torchvision 0.11.3+cu113, pytorch-lightning 1.6.4
1 Like
Short answer is yes. I run inside distribox which is basically rootless podman.
ROCm is not a driver stack. More like a runtime and SDK.
The driver part is actually included in every Linux kernel newer than 5.14.
Share /dev/kfd and /dev/render/ to your container and make sure it has access to these devices. Generally the render group owns those.
4 Likes