Trying to get GPU Folding (OpenCL) running on Fedora

Hi there,

I shared this in the Folding@Home thread already, but since this seems to be more complex then initially thought I’m opening a new thread.

TL;DR: Anyone got a hot tip on getting OpenCL on Fedora to work?

For reference:

I tried digging around a little and it might be harder then expected. So essentially it seems FAH is using a hardcoded library name (including version) - I don’t know if this is even the right way to do it but that’s not the point either. From what I can tell it tries to find libOpenCL.so, but can’t because it doesn’t exist:

[tarulia@localhost ~]$ ls -l /usr/lib | grep libOpenCL && ls -l /usr/lib64 | grep libOpenCL
lrwxrwxrwx.  1 root root       18 Jul 26  2019 libOpenCL.so.1 -> libOpenCL.so.1.0.0
-rwxr-xr-x.  1 root root   144860 Jul 26  2019 libOpenCL.so.1.0.0
lrwxrwxrwx.  1 root root        18 Jul 26  2019 libOpenCL.so.1 -> libOpenCL.so.1.0.0
-rwxr-xr-x.  1 root root    137304 Jul 26  2019 libOpenCL.so.1.0.0

And also:

[tarulia@localhost ~]$ ldconfig -p | grep libOpenCL
        libOpenCL.so.1 (libc6,x86-64) => /lib64/libOpenCL.so.1
        libOpenCL.so.1 (libc6) => /lib/libOpenCL.so.1

So yes, it actually does not exist. I then tried manually symlinking the file (even though I don’t really want to mess around with system libs), but that didn’t work either:

So I tried digging around what is providing my current files.

[tarulia@localhost ~]$ dnf repoquery --installed -l mesa-libOpenCL
/etc/OpenCL/vendors/mesa.icd
/usr/lib/.build-id
/usr/lib/.build-id/8f
/usr/lib/.build-id/8f/7570d3fec6096811dec97e7b6576d87f858263
/usr/lib64/libMesaOpenCL.so.1
/usr/lib64/libMesaOpenCL.so.1.0.0

As you can see the mesa package doesn’t actually provide the generic openCL libraries. The generic files are provided separately:

[tarulia@localhost ~]$ dnf provides /usr/lib/libOpenCL.so.1
Last metadata expiration check: 0:11:00 ago on Sun 15 Mar 2020 22:39:34 CET.
ocl-icd-2.2.12-6.fc31.i686 : OpenCL Library (Installable Client Library) Bindings
Repo        : @System
Matched from:
Filename    : /usr/lib/libOpenCL.so.1

ocl-icd-2.2.12-6.fc31.i686 : OpenCL Library (Installable Client Library) Bindings
Repo        : fedora
Matched from:
Filename    : /usr/lib/libOpenCL.so.1

According to the README of the ocl-icd this seems to be a package to forward generic OpenCL calls to vendor-specific libs (i.e. Mesa-OpenCL in this case).

So I tried finding a package that provides libOpenCL.so, and only found both ocl-icd-devel packages:

[tarulia@localhost ~]$ dnf provides /usr/lib64/libOpenCL.so /usr/lib/libOpenCL.so
Last metadata expiration check: 1:26:59 ago on Sun 15 Mar 2020 22:39:34 CET.
ocl-icd-devel-2.2.12-6.fc31.i686 : OpenCL Library Development files
Repo        : fedora
Matched from:
Filename    : /usr/lib/libOpenCL.so

ocl-icd-devel-2.2.12-6.fc31.x86_64 : OpenCL Library Development files
Repo        : fedora
Matched from:
Filename    : /usr/lib64/libOpenCL.so

Tried installing both of them but they also just add a symlink so the result is the same as above.

Anything I missed?

Same setup as mine, I’ve played around with it a tiny bit yesterday. What I did so far is just symlink the libOpenCL.so.1 to linOpenCL.so and FAH did detect the card correctly and everything. But it still does not work when I try to add the GPU slot. Did not investigate further yet, after I’m done with work I might.

It seems it just loads dynamically in runtime, so it might just be missing something else.

Well it detects the card, but the message I’m getting is at the OpenCL line in the System Info. There is “GPUs” (where it lists the GPU Info), “CUDA” (which is irrelevant for me), and “OpenCL” (where I get the error message).

I probably forgot to mention, but when adding the GPU slot that works. But when I get a WU it can’t start because it doesn’t find the card there. It says I can specify the GPU index, but doing that also doesn’t work for me…

So, just got some time so wanted to clarify how my System Info looks:

As you can see the GPU is found, but OpenCL can’t find the GPU for compute…

i tried and gave up a while back

which gpu?

last i heard radeon needed ROCM for opencl gpu compute to work and that is amdgpu-pro only which doesn’t work in fedora (AFAIK). ***

i could be wrong but that’s where my troubleshooting ended.


edit:
maybe you can compile something from source. not sure.

FYI, the servers are out of work ATM

Vega 64 as per the screenshot :wink:

Mh that would be a real bummer… I’m pretty sure I read somewhere OpenCL was supported but maybe I’m remembering wrong.
That would also mean RHEL would be affected, which seems a huge market in the enterprise to miss… edit: apparently amdgpu-pro is supported on CentOS, but not Fedora. Well… great :expressionless:
Also even if I could get amdgpu-pro to work that would mean gaming performance would tank since amdgpu-pro isn’t as up to date and not nearly as supported gaming-wise :frowning:

I know, but that is all the more time to make it work for when they have work again :stuck_out_tongue:

Ah didn’t see that one hiding there :smiley:

Yeah. this is another driver for me towards ubuntu 20.04 LTS. I want ROCM for blender.

ROCM via amdgpu-pro and official zfs support. Just waiting on release before i blow away Fedora and switch.

Maybe during covid19 lockdown when that happens.

I suspect not many people are running RHEL on OpenCL render farms, etc. Seems more like an ubuntu market to me based on general feel of the respective distros. RHEL would be too conservative and behind the curve performance wise.

But yeah, it’s a shame.

1 Like

Kind of a bummer really… I mean OpenCL is in Mesa, and Mesa is supported by AMD, no? I don’t get it to be honest :confused:

@wendell I don’t usually ping with small issues like this, but I know you played around with Vega and AMD in general a lot and you also used Fedora in the past. Have you ever played around with OpenCL as well?

I’m having this problem in Ubuntu 18.04 myself. I followed all the instructions to install the ROCM stuff and it does appear to work with tools like clinfo and rocminfo. I’ve also got the ICD working which provides a redirector for libOpenCL.so via the ocl-icd-opencl-dev package. (Although even if I use LD_LIBRARY_PATH or LD_PRELOAD to jump directly to the libOpenCL.so ROCM library, it doesn’t work.)

Sample code seems to work fine.

But the FAHClient and FAHBench still can’t make it work.

FAHBench-cmd just says:
Error initializing context: clGetPlatformIDs (-1001)

1 Like

At least it’S failing at a different step for you. Question is if clGetPlatformIDs is before or after clGetDeviceIDs.

It’s a slightly different program. I get the DeviceIDs error from FAHClient and the clGetPlatformIDs error from FAHBench-cmd.

Right just saw that as well. Might be they are calling it in different orders.

Finally got a few minutes, and I forgot to mention clinfo looked fine all the time:

[tarulia@localhost ~]$ clinfo
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0
Number of platforms                               3
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 19.2.8
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 1.2 pocl 1.5-pre, RelWithDebInfo, LLVM 9.0.0, RELOC, SLEEF, DISTRO, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             POCL

  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 2.0 beignet 1.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
  Platform Extensions function suffix             Intel
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [2]
param: 4, val: 0
cl_get_gt_device(): error, unknown device: 0

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     Radeon RX Vega (VEGA10, DRM 3.36.0, 5.5.8-200.fc31.x86_64, LLVM 9.0.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 19.2.8
  Driver Version                                  19.2.8
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               64
  Max clock frequency                             1630MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              8589934592 (8GiB)
  Error Correction support                        No
  Max memory allocation                           6871947673 (6.4GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        2147483647 (2GiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

  Platform Name                                   Portable Computing Language
Number of devices                                 1
  Device Name                                     pthread-AMD Ryzen 7 2700X Eight-Core Processor
  Device Vendor                                   AuthenticAMD
  Device Vendor ID                                0x6c636f70
  Device Version                                  OpenCL 1.2 pocl HSTR: pthread-x86_64-unknown-linux-gnu-znver1
  Driver Version                                  1.5-pre
  Device OpenCL C Version                         OpenCL C 1.2 pocl
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               16
  Max clock frequency                             3700MHz
  Device Partition                                (core)
    Max number of sub-devices                     16
    Supported partition types                     equally, by counts
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             4096x4096x4096
  Max work group size                             4096
  Preferred work group size multiple              8
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                               16 / 16      
    int                                                  8 / 8       
    long                                                 4 / 4       
    half                                                 0 / 0        (n/a)
    float                                                8 / 8       
    double                                               4 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              14647037952 (13.64GiB)
  Error Correction support                        No
  Max memory allocation                           4294967296 (4GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8388608 (8MiB)
  Global Memory cache line size                   64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Global
  Local memory size                               4194304 (4MiB)
  Max number of constant args                     8
  Max constant buffer size                        4194304 (4MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
  printf() buffer size                            16777216 (16MiB)
  Built-in kernels                                (n/a)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_fp64 cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64


  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContext(NULL, ...) [other]              Success [POCL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX Vega (VEGA10, DRM 3.36.0, 5.5.8-200.fc31.x86_64, LLVM 9.0.0)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX Vega (VEGA10, DRM 3.36.0, 5.5.8-200.fc31.x86_64, LLVM 9.0.0)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX Vega (VEGA10, DRM 3.36.0, 5.5.8-200.fc31.x86_64, LLVM 9.0.0)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2

opencl and rocm support requires the amdgpu-pro driver to be installed

this will also kill your gaming btw, so be aware

https://www.amd.com/en/support/kb/release-notes/rn-radpro-lin-16-40

Thanks, already discussed that above tho :wink:

I just find it weird because the device is found and OpenCL is supposed to be in Mesa, so I wonder what that is about.

it’s because AMD is a bunch of clowns when it comes to things like machine learning and codecs for streaming (the guy who did AMD updates for OBS recently stopped working on AMD because they were such a pain in the ass to work with).

I don’t understand at all why they did this. It needs to be fixed. I am actually going to be purchasing a 20 series nvidia card in a while just because doing neural nets on amd sucks and stuff like tensorflow and pytorch have crap AMD compatibility.

I had some time today so I gave it a shot. Tried to deduce the root cause of it not working on OpenSource drivers.

I discovered that the open source driver definitely does not include support for these cards. I’m getting an error that /usr/lib64/clc//gfx1010-amdgcn-mesa-mesa3d.bc is not found. There are a bunch of .bc files for other cards, but not this one.

The package libclc provides these files (as told by dnf provides <bc file>). Fedora 31 sports version 0.2.0-16.git9f6204e.fc31. I’ve looked into Fedora 32 (beta). It sports newer mesa (20.* version), but libclc still does not include the needed file. Could not find anyone trying to implement support so far.

Now to the meaty stuff. Those bc files are LLVM Bitcode files. It is basically a ‘medium’, with which CL kernels are compiled. Since RDNA is a new architecture nobody has implemented this support yet (apparently). GCN architectures are all supported, and I’ve found this little old gem. It’s exactly how bitcode for GCN is made, although this is somebody’s personal attempt, so I guess not official release stuff. I’ll try to hunt down the release ones. I’ll also check for in-development bitcode for RDNA.

But in short - if you want to get OpenCL running - you need amdgpu drivers. They’re not available on Fedora. I’ve looked into them too - they don’t ship bitcode, they basically build the entire stack responsible for graphics with their stuff in there. Nothing to easily reuse unfortunately.

I’ll say, this has been fun. I’m in quarantine anyway, so this is a great avenue for learning some of this stuff :smiley:

1 Like

Thanks for looking into this. One thing however:

I’m not running RDNA, I’m using an Vega 64

Although they do officially support CentOS, and I’d imagine getting the CentOS package to work shouldn’t be all that much work.
But then again, this would tank gaming performance as per the above.

Kind of a bummer really because Vega are pretty good Compute Cards :frowning:

Oh right! OpenCL should be operational for you, just a matter of making FAH to figure that out. I got way derailed from getting it to run, since there’s no OpenCL for me :smiley:.

I actually did some more digging since posting. I’ll post that for the sake of completeness. But that’s already in regards to 5700/5700XT:

  • AMDGPU drivers seem to just need the kernel module to work. I’ve found the implementation on Github (more stuff on there), so if there’s no compatibility problems, it shouldn’t be too much of an issue to compile and run it in Fedora. Still, it should probably take a while to sort it out, I’m sure it won’t be without problems. Would need to set up dkms or forget about frequent kernel updates. And I’d be keen to try out OpenSource alternatives first.
  • Turns out RDNA is not a whole new ISA. it seems to still implement GCN instruction set for the most part (could’ve expected that one, who’s to throw away years of work), but somewhat different it seems. It’s probably nothing new lol, but it’s new to me. After finding this, I’ve tried to link the missing file to the last latest architecture bitcode file. The program just hanged this time lol.
  • Sifted through some of the LLVM codebase (LLVM is what’s driving OpenCL kernels compilation, contains libclc). Found references to RDNA and glx10 (abbrevation for glx1010 glx1011 and glx1012). Found GCN instruction set bitcode glx10 (RDNA) bitcode and related stuff. No build targets for them in libclc. LLVM people are still working on this it seems, bitcode commits are about 9 months old. Screenshot of recent commit:
    Screenshot%20from%202020-03-21%2022-13-39

Phew, I should get some sleep.

2 Likes