Fix cuda for tensorflow 2.0

I’ve been working for the last 2 days (actual quarantine perks) to get tensorflow2.0 to work correctly with my gtx1080. I’m on ubuntu 18.04lts with pop_os using cuda9. the specific error message is:2020-03-25 22:03:29.833389: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libcudart.so.10.2: cannot open shared object file: No such file or directory 2020-03-25 22:03:29.833533: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvrtc.so.10.2: cannot open shared object file: No such file or directory 2020-03-25 22:03:29.833544: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2020-03-25 22:03:30.690848: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2020-03-25 22:03:30.694793: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 2020-03-25 22:03:30.694819: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (pop-os): /proc/driver/nvidia/version does not exist 2020-03-25 22:03:30.695105: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-03-25 22:03:30.719751: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3193920000 Hz 2020-03-25 22:03:30.720465: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4bc8360 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-03-25 22:03:30.720507: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version Train for 195 steps, validate for 3 steps Epoch 1/30 2020-03-25 22:03:44.417531: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started. 2020-03-25 22:03:44.420449: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.1'; dlerror: libcupti.so.10.1: cannot open shared object file: No such file or directory 2020-03-25 22:03:44.420477: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI could not be loaded or symbol could not be found. 2020-03-25 22:03:44.420495: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI could not be loaded or symbol could not be found. 1/195 [..............................] - ETA: 31:23 - loss: 5.1677 - acc: 0.10942020-03-25 22:03:45.371496: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI could not be loaded or symbol could not be found. 2020-03-25 22:03:45.371563: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88] GpuTracer has collected 0 callback api events and 0 activity events. 5/195 [..............................] - ETA: 8:33 - loss: 3.9591 - acc: 0.1484

What I’ve tried:
-completely uninstalling cuda and anything nvidia, it didn’t seem to actually uninstall anything.
-install cuda10, it isnt used or recognized by the system
-use system76 tools to install cuda, nothing has changed.
-using nvidia’s deb and runfile installation methods

nvcc outputs:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85

I’d like to not reinstall ubuntu to get this working but that is my next logical option and I am hoping you (collectively) have a better idea than nuking it and starting over.

To start with, what version of the Nvidia driver is installed and is the kernel module loaded? There is a minimum driver requirement for CUDA 10.

You’ll need at least CUDA 10.1 for Tensorflow 2, so skip installing CUDA 9. I recommend this method of installing Nvidia’s version packages. This will take care of both the driver and CUDA. You’ll then need to separately create an Nvidia account to download and install the cuDNN packages.

If you have installed multiple versions of CUDA, you may have the symlink /usr/local/cuda determining which one is “active.” If it’s still pointing to CUDA 9, try changing it. You may have to run sudo ldconfig afterwards.

1 Like

the method you recommended does this:
nick:~$ sudo apt-get -y install cuda
Reading package lists… Done
Building dependency tree
Reading state information… Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
cuda : Depends: cuda-10-2 (>= 10.2.89) but it is not going to be installed

That’s obnoxious.

I have tried sudo aptitude install cuda and getting it to install all the things, it ended with:
dpkg: error processing package cuda-tools-10-2 (–configure):
dependency problems - leaving unconfigured
Setting up cuda-nsight-compute-10-2 (10.2.89-1) …
Setting up libnvidia-ifr1-440:amd64 (440.44-1pop1~1576524804~18.04~fb5b8b3) …
Setting up cuda-cupti-10-2 (10.2.89-1) …
Setting up cuda-cupti-dev-10-2 (10.2.89-1) …
Setting up nvidia-driver-440 (440.44-1pop1~1576524804~18.04~fb5b8b3) …
Setting up cuda-command-line-tools-10-2 (10.2.89-1) …
Setting up cuda-drivers (440.33.01-1) …
Processing triggers for man-db (2.8.3-2ubuntu0.1) …
Processing triggers for gnome-menus (3.13.3-11ubuntu1.1) …
Processing triggers for dbus (1.12.2-1ubuntu1.1) …
Processing triggers for mime-support (3.60ubuntu1) …
Processing triggers for desktop-file-utils (0.23-1ubuntu3.18.04.2) …
Processing triggers for libc-bin (2.27-3ubuntu1) …
Processing triggers for initramfs-tools (0.130ubuntu3.8pop1) …
update-initramfs: Generating /boot/initrd.img-5.3.0-7642-generic
cryptsetup: WARNING: target cryptswap has a random key, skipped
Errors were encountered while processing:
cuda-libraries-10-2
cuda-nvrtc-dev-10-2
cuda-npp-dev-10-2
cuda-nvgraph-dev-10-2
cuda-toolkit-10-2
cuda-10-2
cuda-cusparse-dev-10-2
cuda-runtime-10-2
cuda-libraries-dev-10-2
cuda-curand-dev-10-2
cuda-cusolver-dev-10-2
cuda
cuda-demo-suite-10-2
cuda-visual-tools-10-2
cuda-samples-10-2
cuda-nvjpeg-dev-10-2
cuda-cufft-dev-10-2
cuda-documentation-10-2
cuda-tools-10-2

Current status: 11 (+11) broken, 10 (+1) upgradable.

Then doing sudo apt install --fix-broken and its running into issues:
Errors were encountered while processing:
/tmp/apt-dpkg-install-T9UZXs/0-cuda-nvrtc-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/1-cuda-cusolver-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/2-cuda-cufft-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/3-cuda-curand-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/4-cuda-cusparse-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/5-cuda-npp-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/6-cuda-nvml-dev-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/7-cuda-nvjpeg-10-2_10.2.89-1_amd64.deb
/tmp/apt-dpkg-install-T9UZXs/8-cuda-nvgraph-10-2_10.2.89-1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

still throwing the same error as the original post…