Deepdoctection OCR and its quirks - In need of assistants and a bigger brain

bedHedd · July 1, 2024, 12:06am

Looks like you followed a similar cuda setup as Wendell’s guide. If not this might be tangentially related, it’s written for pytorch, but if you are having trouble installing cuda this might be a alternate.

Ubuntu 22.04 - From Zero to 70b Llama (with BOTH Nvidia and AMD 7xxx series GPUs)

Steps I took
# first fully update 22.04 LTS
apt update && apt upgrade -y 
# reboot ... you probably got a newer kernel... 

# ensure remote access

Since we are updating the video driver, and it is likely you don't have more than one gpu in the system, ensure you can ```ssh``` into the system from another system. This is useful for both setup and troubleshooting, Should Something Go Wrong. 


# nvidia part 1

We need the nvidia GPU proprietary driver first. If the only GPU in the system is nvidia and you're using the nouveau driver, it must be blacklisted first. Before you reboot, install the nvidia drivers. Then reboot.

```lsmod``` and check the output to confirm the ```nvidia``` module is loaded; check ```dmesg``` to be sure you do NOT see messages like:
[ 1044.501389] NVRM: The NVIDIA probe routine was not called for 1 device(s).
... this message indicates "something else" has claimed your nvidia card (most likely nouveau). 

Once nvidia is loaded and ```dmesg``` is free of errors that might indicate the nvidia driver 


# nvidia part 2
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install python3-pip 
The next part is deciding… what CUDA version do I need?

Start Locally | PyTorch

image1867×943 91.4 KB

This page helps make that decision for us.

apt search shows cuda 11-(lots of versions) as well as 12.1 and 12.2; if we want the “stable” Pytorch, then it makes sense to get CUDA 12.1 to match this, and to lower the headache that we have to deal with.
sudo apt install cuda-12-1
… this version made the most sense, based on the information on the pytorch website.

Longer Explanation

If you aren’t familiar with Python, especially version 3, Python does support running multiple virtual environments and managing versions of things separately.

The analagous facility on Linux is probably… docker? (Me saying that is a bit heretical if you already know about these things, but the reason I say that is docker is a convenient containerization system that abstracts away some of this complexity. It is also possible to setup docker and let it interface the cuda hardware directly. If you need to run CUDA 11.8 and CUDA 12.1 and CUDA 12.2 on the same box without a lot of headache I think this is the best approach… or at least… I haven’t seen another approach with worse tradeoffs.

For thepurposes of this demo/guide we are installing CUDA 12.1 because that’s all we need. Perhaps another guide expanding on The Docker Way I can link here in the future.

Status Check

To be confident one is at the right part of the process, nvidia-smi should be present on the system AND have reasonable output such as:
# this command
 sudo nvidia-smi

#outputs this:
Tue Jan 30 01:58:37 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:41:00.0 Off |                  Off |
| 30%   57C    P8              26W / 300W |      3MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
… *don’t worry that it says 12.3 or some other cuda version you didn’t pick here; that’s okay *

Next we can actually run the command recommended by the pytorch installer website, in my case that was
pip3 install torch torchvision torchaudio
and that should look like
$ pip3 install torch torchvision torchaudio

Defaulting to user installation because normal site-packages is not writeable
Collecting torch
  Downloading torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━ 502.6/670.2 MB 116.7 MB/s eta 0:00:02
 
... (lot of the downloading and installing happening...) 
Test cuda is okay now?
python3 -c "import torch; print(torch.cuda.device_count())"
The output should be 1 or the # of cuda devices you actually have.