CUDA 12.9 on Fedora 42 Guide including getting `cuda-samples` running

I have struggled getting Cuda, NVCC, and the cuda-samples project running on my computer, so I am writing this to help others.

Installing NVIDIA drivers

As someone that uses the professional CUDA blackwell cards, I am required to use the NVIDIA open kernel module. It works for both gaming and professional cards. I would advise you to use it as the closed source kernel module will eventually be deprecated.

sudo dnf install rpmfusion-nonfree-release # make sure rpmfusion non-free is enabled
sudo sh -c 'echo "%_with_kmod_nvidia_open 1" > /etc/rpm/macros.nvidia-kmod' # use the open modules when building the kernel modules
sudo dnf install akmod-nvidia xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda xorg-x11-drv-nvidia-cuda-libs # install the fedora drivers and cuda libraries

DO NOT REBOOT UNTIL KERNEL MODULES HAVE BEEN COMPILED. You can simply look to see if they are running by calling ps aux | grep kmod. I personally run it with the watch command and wait until nothing but the issued watch commands are returned. watch -n 2 "ps aux | grep kmod"

More information can be found on the RPM Fusion NVIDIA how-to site

Reboot

Install Cuda 12.9

Instructions are based on the RPM Fusion CUDA page

sudo dnf config-manager addrepo --from-repofile=https://developer.download.nvidia.com/compute/cuda/repos/fedora41/$(uname -m)/cuda-fedora41.repo
sudo dnf clean all
sudo dnf config-manager setopt cuda-fedora41-$(uname -m).exclude=nvidia-driver,nvidia-modprobe,nvidia-persistenced,nvidia-settings,nvidia-libXNVCtrl,nvidia-xconfig
sudo dnf -y install cuda-toolkit # 12.9.0 at time of writing

Now comes the juicy part, installing the requirements to get the cuda-samples

Installing NVCC, compatible GCC

To compile cuda C/C++ code, you will need NVCC from the NVIDIA cuda fedora repo we added. We also need GCC 14 as Fedora 42 comes with version 15, which is too new.

sudo dnf install gcc14.x86_64 gcc14-c++.x86_64 cuda-nvcc-12-9

Now set the correct environment variables so that nvcc uses g++ 14 and cmake projects use the correct versions of GCC compilers:

export CUDAHOSTCXX=/usr/bin/g++-14
export CPATH=/usr/include/openmpi-x86_64:$CPATH
export PATH=$PATH:/usr/lib64/openmpi/bin
export CC=/usr/bin/gcc-14
export CXX=/usr/bin/g++-14
export NVCC_CCBIN=/usr/bin/g++-14

Finally we want to make sure that we have the libraries and includes folders added correct to the paths and that the nvcc binary is in our executable paths.

export LD_LIBRARY_PATH=/usr/local/cuda-12.9/targets/x86_64-linux/lib:$LD_LIBRARY_PATH
export CPATH=/usr/local/cuda-12.9/targets/x86_64-linux/include:$CPATH
export PATH=/usr/local/cuda-12.9/bin:$PATH

The dirty hack

/usr/local/cuda-12.9/targets/x86_64-linux/include/crt/math_functions.h have externel declaration functions to /usr/include/bits/mathcalls.h that are incompatible. So let’s fix that by editing /usr/local/cuda-12.9/targets/x86_64-linux/include/crt/math_functions.h. Here are a set of diffs of what I did :

  *
  * \note_accuracy_double
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 sinpi(double x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 sinpi(double x) noexcept (true);
 /**
  * \ingroup CUDA_MATH_SINGLE
  * \brief Calculate the sine of the input argument
@@ -2576,7 +2576,7 @@
  *
  * \note_accuracy_single
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  sinpif(float x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  sinpif(float x) noexcept (true);
 /**
  * \ingroup CUDA_MATH_DOUBLE
  * \brief Calculate the cosine of the input argument
@@ -2598,7 +2598,7 @@
  *
  * \note_accuracy_double
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 cospi(double x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 cospi(double x) noexcept (true);
 /**
  * \ingroup CUDA_MATH_SINGLE
  * \brief Calculate the cosine of the input argument
@@ -2620,7 +2620,7 @@
  *
  * \note_accuracy_single
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  cospif(float x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  cospif(float x) noexcept (true);
 /**
  * \ingroup CUDA_MATH_DOUBLE
  * \brief  Calculate the sine and cosine of the first input argument

Notice how we added noexcept (true) at the end of the sine and cosine commands, that’s what is need to make them compatible with the newer libraries.

And that is it folks. The rest is all about getting the cuda-samples working.

Getting cuda-samples working

Now, there are a few packages that you can install to help with the OpenGL, Vulkan, FreeImage and MPI demoes:

sudo dnf install freeglut freeglut-devel freeimage-devel openmpi openmpi-devel vulkan

To get MPI working, you want to add it’s binaries to your path

export PATH=$PATH:/usr/lib64/openmpi/bin

In a directory of your choice, clone the cuda-samples project

git clone https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples

Now let’s get building:

# in the cuda-samples diretory
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CUDA_ARCHITECTURES=native  .. # I prefer ninja over make
ninja -j30 # change the number to whatever you want, I have 32 threads available in my computer

Grab a cuppa tea and wait (or not). If all goes well, we should have projects we can now run. In the same build directory, let’s run two of my favorite examples

./Samples/2_Concepts_and_Techniques/MC_EstimatePiP/MC_EstimatePiP # simple monte-carlo algorithm for estimating Pi
./Samples/5_Domain_Specific/Mandelbrot/Mandelbrot # tests OpenGL with CUDA, use `+` and `-` to zoom in and out

Happy hacking!

Note:

  • I am not a professional C++ developer, have not done it professionally in over a decade
  • My only CUDA real world experience was when I was at university and I used it to accelerate simulations that were written with OpenMP
  • If there is a better way let me know!
  • If you are not married to Fedora, just use Ubuntu. It’s far easier. For me, I want the newer packages, no Snapshot, and I work with RHEL at work so it’s closer to what I use there.

BONUS: Working inside CLion

Open the root project directory where the CMake with CLion. You will notice that cmake won’t initially work. That is because you need to add the environment variables we have above there:

Go to Settings → Build, Execution, Deployment → CMake and add the following in Environment:

CUDAHOSTCXX=/usr/bin/g++-14;CXX=/usr/bin/g++-14;NVCC_CCBIN=/usr/bin/g++-14;CC=/usr/bin/gcc-14

Then reload the CMake project, which you can easily do from the bottom left corner of the IDE

image

6 Likes

very nice, high fives

1 Like

Excellent! A trivial correction was to remove the quotes here:
export CC=/usr/bin/gcc-14
export CXX=/usr/bin/g++-14

1 Like

Thanks! I removed the quotes in the post.

I really wish NVidia would update their compiler toolchain to accommodate this, it should a fairly trivial fix using preprocessor macros. This does work like a treat though. I sometimes symlink the gcc-14 and g+±14 executables into /usr/local/cuda-x/bin directory, since I think this is added to the head of the PATH when invoking nvcc.

Almost all of this worked for me expect the patch for math_functions.h, it seems to be missing 1 function?

@@ -2553,7 +2553,7 @@
  *
  * \note_accuracy_double
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 sinpi(double x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 sinpi(double x) noexcept(true);
 /**
  * \ingroup CUDA_MATH_SINGLE
  * \brief Calculate the sine of the input argument 
@@ -2576,7 +2576,7 @@
  *
  * \note_accuracy_single
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  sinpif(float x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  sinpif(float x) noexcept (true);
 /**
  * \ingroup CUDA_MATH_DOUBLE
  * \brief Calculate the cosine of the input argument 
@@ -2598,7 +2598,7 @@
  *
  * \note_accuracy_double
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 cospi(double x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ double                 cospi(double x) noexcept (true);
 /**
  * \ingroup CUDA_MATH_SINGLE
  * \brief Calculate the cosine of the input argument 
@@ -2620,7 +2620,7 @@
  *
  * \note_accuracy_single
  */
-extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  cospif(float x);
+extern __DEVICE_FUNCTIONS_DECL__ __device_builtin__ float                  cospif(float x) noexcept (true);
 /**
  * \ingroup CUDA_MATH_DOUBLE
  * \brief  Calculate the sine and cosine of the first input argument 

Looking at it again, perhaps just the first line is missing?