Is Arch Linux viable for ML with CUDA?

thecoderx · April 19, 2024, 4:09pm

What has your experience been running Arch for ML and DL workflows? Have you had any issues with CUDA or Tensorflow/Pytorch, etc.? I am currently using Ubuntu 22.04 LTS because it seemed like it was the most widely supported across the software stack, but I do like the idea of Arch better. I just need my software to be stable, and I don’t have the time to troubleshoot it on a regular basis, so I am hesitant to switch. But I don’t even know if this is a valid concern.

Thanks,
Joe

compy386 · April 19, 2024, 5:21pm

Arch Linux is not really designed with the principle of stability in mind IMO. If you are looking for the traditional definition of stability, as in thorough testing and validation, static package versions, a philosophy of never pushing breaking changes, then Arch is not really ideal.

quilt · April 19, 2024, 6:01pm

You could, but it will be more of a hassle than ubuntu.

There are no official arch packages available of cuda. But everything you need should be available somehow (AUR, conda, docker?)
Your version of gcc will probably be newer than what cuda supports, so you would either need to install another version, or use a conda environment, a docker container or some other solution to use an older gcc version than the one shipped with the distro. I’m not sure how easy installing multiple gcc versions is on arch? You can also override the gcc version check, but you don’t want to find out some weird edge case in the middle of work.

Otherwise I’m not too familiar with arch (I use fedora personally), so I don’t know how much truth is behind the ‘arch breaks after update’ meme is

If you want something a bit more up to date than ubuntu LTS, but not as cutting edge as arch, fedora or openSUSE are good options. But fedora struggles with the second point too a lot of the time. Currently nvidia supports the latest fedora release (39), but soon 40 will launch and then it will take them again a while to catch up to fedora’s gcc version.

igormp · April 19, 2024, 7:21pm

I’ve been using arch and work with ML stuff on CUDA (with 2x3090s) without any issues.

Major “issue” is when some package upgrades, like CUDA, Torch, or Tensorflow, but they aren’t really in sync, so you have to grab a nightly or setup a proper venv/docker container to work on stuff you don’t want to update. I’m pretty used to that, so it’s not much of an issue to me, or you can just wait a day or two for things to catch up between them.

What do you mean? They’re right there in the repos:
https://archlinux.org/packages/extra/x86_64/cuda/

The CUDA team actually did a good job keeping up with upstream, CUDA works with GCC 13.2 without issues, and also supports Clang 17

quilt · April 19, 2024, 7:53pm

Official = nvidia

Sure, but fedora 40 is launching soon with gcc 14 and clang 18. I’m not sure what the release schedule of arch is, but I assume it will update soon?

It takes nvidia a while usually to support new fedora releases, precisely because of the gcc version. It often takes them a while to catch up, and lately they even skipped over fedora 39:

https://developer.download.nvidia.com/compute/cuda/repos/

That’s not saying they’re not doing a good job, but fedora (and I assume arch) are quite fast moving and it’s something to keep in mind. It only comes up if you actually need to compile stuff. If you’re just installing some virtualenv of a project they’ll pull in all the compiled components any way.

igormp · April 19, 2024, 8:13pm

Ah, got it. Don’t think that’s really relevant in the end as long as you can get it up and running easily.

Fun fact: fedora often manages to update stuff faster than Arch, so it may not be an issue at all and might take long enough for nvcc to get updated along with it.

quilt · April 19, 2024, 8:54pm

I agree – these kinds of packages just install huge amounts into /opt and don’t dynamically link to anything in the OS. So it does not matter too much for which distro they were ‘meant’.

I install a number of things that nvidia packaged for RHEL8 (like the container tools), and they run fine on fedora.

thecoderx · April 20, 2024, 1:54pm

Thanks for the input guys, it sounds like Arch Linux may not be the best option for me right now. I am not unhappy with Ubuntu, I would just like more freedom to optimize the OS to my workflow and liking.

compy386 · April 20, 2024, 3:27pm

I feel like the Devuan project might give you something based on the same upstream as Ubuntu lts but with more administrative flexibility… it’s designed for server applications though so not sure how the desktop experience would be.

quilt · April 20, 2024, 4:33pm

In which sense exactly? In most senses the differences between distro’s are

(1) a tradeoff between stability and cutting-edge (where ubuntu LTS is a pretty good tradeoff for a productive machine – many linux users just love up-to-date stuff tinkeraing more than they should)

(2) default desktop environment and preinstalled packages.

You can still customize ubuntu until the wheels fall off: try multiple desktop environments, get newer packages from ppa’s, customize the whole system, make it leaner by disabling services you don’t need, etc.

thecoderx · April 21, 2024, 3:02pm

@quilt thanks for saying this I am probably falling into this rabbit hole. Although I do find it a good way to learn more about Linux.

I have customized it quite a bit, although I know I have only scratched the surface. But it seemed to me that if did go out and get newer packages and install other desktop environments I am sort of defeating the idea of Ubuntu LTS and its stability. And if that is really what I wanted I should just build up my environment on Arch. Maybe my thinking is flawed but that is why I got things where they are and decided to pose this question.

compy386 · April 21, 2024, 3:08pm

You can find updated official packages in the backports repo…

https://help.ubuntu.com/community/UbuntuBackports

You will need to be more careful about package management when using these, as they may pull updated dependencies which is usually where you will wind up borking things.

Iron_Bound · April 22, 2024, 6:49am

Nixos has been good and bad for machine learning.

The ability to pin packages and have multiple versions of packages can help, learning curve is to step but there are some good things it gives you if you learn their program language.

https://nixos.wiki/wiki/CUDA

Fyi to save time building non free software, you can use this repo:
https://app.cachix.org/cache/cuda-maintainers#pull

thecoderx · April 22, 2024, 2:06pm

@Iron_Bound thanks! I am going to look into this more and see if it is what I am looking for.

inputoutput1126 · April 23, 2024, 6:23pm

would I recommend arch over static package versioned distros?
no
has arch gotten alot better recently for ml and cuda?
yes.

specifically. if you use docker and container-toolkit. it’s finally a part of the extra repo instead of being an aur package. if you plan to daily drive arch and wonder how it will be for ML, it’ll be fine. if you’re looking for something specific for ML? I suggest fedora. ubuntu is always too out of date for the rapid pace ML stuff rn IMO