DALL-E Mini, Mega or Mega_full -- up and running on your hardware!

Notes on what I did:

Manjaro, full proprietary driver install

2080Ti + Tesla V100

Pre Reqs



sudo pacman -S python3 python-pip tensorflow-cuda cudnn python-tensorflow-cuda 
sudo pacman -S python-pytorch-cuda cuda


Playground git repo

https://github.com/saharmor/dalle-playground
  1. Clone or fork this repository
  2. Create a virtual environment cd backend && python3 -m venv ENV_NAME
  3. Install requirements pip install -r requirements.txt
  4. Make sure you have pytorch and its dependencies installed Installation guide
  5. Run web server python app.py --port 8080 --model_version mini (you can change from 8080 to your own port)
  6. In a different terminal, install frontend’s modules cd interface && npm install and run it npm start
  7. Copy backend’s url from step 5 and paste it in the backend’s url input within the web app

What if I get ptxas errors and it falls back to using CPU?

Even if you don’t have a CUDA device, it is still possible for it to run from the CPU. It was decently fast from a 32 core threadripper system.

2022-06-16 06:46:21.595653: I external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-06-16 06:46:21.596077: I external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-06-16 06:46:21.596087: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:80] Couldn't get ptxas version string: INTERNAL: Couldn't invoke ptxas --version
2022-06-16 06:46:21.596665: I external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-06-16 06:46:21.596695: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:460] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: Failed to launch ptxas'  If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.

This error was related to both pip install jax and not having /opt/cuda/bin in the path. I corrected with

declare -x PATH=$PATH:/opt/cuda/bin

What do I do if it grabs the wrong GPU? or I get GPU errors about it

2022-06-16 06:49:58.770162: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2141] Execution of replica 1 failed: INVALID_ARGUMENT: executable is built for device CUDA:0 of type "Tesla V100-PCIE-32GB"; cannot run it on device CUDA:1 of type "NVIDIA GeForce RTX 2080 Ti"

In my case I have both a 2080Ti and V100 in this threadripper system. I wanted it to use the V100 with its 32gb of vram as shown in the video. This error is a bit obtuse.

The fix was:

TF_CPP_MIN_LOG_LEVEL=0 CUDA_VISIBLE_DEVICES=0 python3 app.py --port 8080 --model_version mega_full

Device 0 was the V100, device 1 was the 2080Ti. Your system may have different indexes for devices.

You can troubleshoot CUDA and gpus in python further, too, with commands like:

import tensorflow as tf
print(tf.test.gpu_device_name())

How do I know what the GPU is doing and/or that the GPU is busy?

nvidia-smi

Output:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   35C    P0    74W / 300W |    578MiB / 11264MiB |     64%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-PCIE...  Off  | 00000000:21:00.0 Off |                  Off |
| N/A   32C    P0    35W / 250W |  11577MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1387      G   /usr/lib/Xorg                     255MiB |
|    0   N/A  N/A      1478      G   /usr/bin/gnome-shell               66MiB |
|    0   N/A  N/A      2508      G   /usr/lib/firefox/firefox          169MiB |
|    0   N/A  N/A      5603      G   /usr/bin/gjs                        7MiB |
|    0   N/A  N/A     48157      G   obs                                72MiB |
|    1   N/A  N/A      1387      G   /usr/lib/Xorg                       4MiB |
|    1   N/A  N/A     45112      C   python3                         11537MiB |
+-----------------------------------------------------------------------------+



Errors I ran into and workarounds

WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
4 Likes

DM ME. I have something you seek. *done. now you may preserve this, pin it etc.

funny, i was trying the same today just before wendell published the vide. i got everything working on a gtx 1060 laptop, but it sems difficult to use it on my rx6800 desktop, due to there not being a jax version for rocm

ok, I want to try it and am hitting a wall. Here’s my setup

DALL-E works, but doesn’t use the GPU (3090), this is what I get:

TF_CPP_MIN_LOG_LEVEL=0  python3 app.py --port 8080 --model_version mini
--> Starting DALL-E Server. This might take up to two minutes.
2022-06-30 21:33:56.470341: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:172] XLA service 0x56142436ac90 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-06-30 21:33:56.470357: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:180]   StreamExecutor device (0): Interpreter, <undefined>
2022-06-30 21:33:56.475745: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:181] TfrtCpuClient created.
2022-06-30 21:33:56.476022: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:262] Libtpu path is: libtpu.so
2022-06-30 21:33:56.476145: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
--> DALL-E Server is up and running!
--> Model selected - DALL-E ModelSize.MINI
 * Serving Flask app 'app' (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
INFO:werkzeug: * Running on all addresses (0.0.0.0)
   WARNING: This is a development server. Do not use it in a production deployment.
 * Running on http://127.0.0.1:8080
 * Running on http://10.0.1.10:8080 (Press CTRL+C to quit)

What am i missing?

Here is some info on my setup

hardware

OS: Arch Linux x86_64
Kernel: 5.18.8-arch1-1
CPU: AMD Ryzen Threadripper 3970X (64) @ 3.700GHz
GPU: NVIDIA GeForce RTX 3090
GPU Driver: NVIDIA 515.57
Memory: 12.67GiB / 62.74GiB (20%)
Disk (/): 684G / 2.7T (27%)

package versions

pacman -Q python3 python-pip tensorflow-cuda cudnn python-tensorflow-cuda python-pytorch-cuda cuda
python 3.10.5-1
python-pip 22.1.2-1
tensorflow-cuda 2.9.1-1
cudnn 8.4.1.50-1
python-tensorflow-cuda 2.9.1-1
python-pytorch-cuda 1.11.0-11
cuda 11.7.0-2

python

python
Python 3.10.5 (main, Jun  6 2022, 18:49:26) [GCC 12.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> x = torch.rand(5, 3)
>>> print(x)
tensor([[0.0334, 0.6962, 0.7523],
        [0.5659, 0.1336, 0.4028],
        [0.7392, 0.2731, 0.8371],
        [0.6654, 0.8625, 0.0329],
        [0.7158, 0.9675, 0.2320]])
>>>
>>>
>>> import tensorflow as tf
>>> print(tf.test.gpu_device_name())
/device:GPU:0
>>>

pip

pip show torch torchvision torchaudio jax
Name: torch
Version: 1.12.0+cu116
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /home/brian/.local/lib/python3.10/site-packages
Requires: typing-extensions
Required-by: torchaudio, torchvision
---
Name: torchvision
Version: 0.13.0+cu116
Summary: image and video datasets and models for torch deep learning
Home-page: https://github.com/pytorch/vision
Author: PyTorch Core Team
Author-email: [email protected]
License: BSD
Location: /home/brian/.local/lib/python3.10/site-packages
Requires: numpy, pillow, requests, torch, typing-extensions
Required-by:
---
Name: torchaudio
Version: 0.12.0+cu116
Summary: An audio package for PyTorch
Home-page: https://github.com/pytorch/audio
Author: Soumith Chintala, David Pollack, Sean Naren, Peter Goldsborough
Author-email: [email protected]
License: UNKNOWN
Location: /home/brian/.local/lib/python3.10/site-packages
Requires: torch
Required-by:
---
Name: jax
Version: 0.3.14
Summary: Differentiate, compile, and transform Numpy code.
Home-page: https://github.com/google/jax
Author: JAX team
Author-email: [email protected]
License: Apache-2.0
Location: /home/brian/.local/lib/python3.10/site-packages
Requires: absl-py, etils, numpy, opt-einsum, scipy, typing-extensions
Required-by: chex, dalle-mini, flax, optax, vqgan-jax

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57       Driver Version: 515.57       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   30C    P0   115W / 375W |    784MiB / 24576MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1758      G   /usr/lib/Xorg                     337MiB |
|    0   N/A  N/A      1813      G   /usr/bin/gnome-shell               58MiB |
|    0   N/A  N/A      2167      G   alacritty                          12MiB |
|    0   N/A  N/A      2220      G   alacritty                          12MiB |
|    0   N/A  N/A      2239      G   alacritty                          12MiB |
|    0   N/A  N/A      2258      G   alacritty                          12MiB |
|    0   N/A  N/A      2928      G   /usr/lib/firefox/firefox          334MiB |
+-----------------------------------------------------------------------------+

Thanks for any help

Woot! got it to work on GPU. yes i had the β€œno gpu/tpu devices” error even after confirming the gpus listed in tf.devices. here is my fix log.

I can’t post links, and links are in the commandlines… so YAY. TLDR: uninstall jax and jaxlib with pip and reinstall with the [CUDA] option.

My error was related to jax

[[email protected] backend]$ TF_CPP_MIN_LOG_LEVEL=0 CUDA_VISIBLE_DEVICES=0 python app.py --port 8080 --model_version mini
β†’ Starting DALL-E Server. This might take up to two minutes.
2022-07-01 12:25:38.438263: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:172] XLA service 0x55b0aeb4f780 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2022-07-01 12:25:38.438292: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:180] StreamExecutor device (0): Interpreter,
2022-07-01 12:25:38.445287: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/tfrt_cpu_pjrt_client.cc:181] TfrtCpuClient created.
2022-07-01 12:25:38.682248: I external/org_tensorflow/tensorflow/core/tpu/tpu_initializer_helper.cc:262] Libtpu path is: libtpu.so
2022-07-01 12:25:38.682449: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
β†’ DALL-E Server is up and running!
β†’ Model selected - DALL-E ModelSize.MINI

  • Serving Flask app β€˜app’ (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
    INFO:werkzeug: * Running on all addresses (0.0.0.0)
    WARNING: This is a development server. Do not use it in a production deployment.
  • Running on REDACTED

Fri Jul 1 12:46:13 2022
Β±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7 |
|-------------------------------Β±---------------------Β±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M40 24GB Off | 00000000:03:00.0 Off | 8590021886 |
| N/A 29C P8 15W / 250W | 4MiB / 23040MiB | 0% Default |
| | | N/A |
Β±------------------------------Β±---------------------Β±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:08:00.0 Off | N/A |
| 0% 29C P8 N/A / 120W | 14MiB / 4096MiB | 0% Default |
| | | N/A |
Β±------------------------------Β±---------------------Β±---------------------+

Β±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 944 G /usr/lib/Xorg 3MiB |
| 1 N/A N/A 944 G /usr/lib/Xorg 11MiB |
Β±----------------------------------------------------------------------------+

pip install --upgrade β€œjax[cuda]”

AttributeError: module β€˜jaxlib.pocketfft’ has no attribute β€˜pocketfft’. Did you mean: β€˜_pocketfft’?

it happens when there’s a version incompatibility between jax and jaxlib.

Jaxlib must match the jax version, latest jax with cuda is 2.22 however jaxib 2.22 is missing part of this was the discovery of jaxlib version 3.14 on the system. This caused β€œdependency already satisfied” when pip installing jax[cuda]. so uninstall it

pip uninstall jaxlib

now goto official docs, I used github {DOT} com/google/jax/blob/main/README.md#pip-installation-gpu-cuda which recommended two commands.

pip install --upgrade pip
pip install --upgrade β€œjax[cuda]” -f ps://storage {DOT} googleapis {DOT} com/jax-releases/jax_cuda_releases {DOT} html

bam, nvidia-smi shows jax is loading up all my gpus… and errored when it discovered that a GTX 1060 with 6gb is not an Tesla m40 with 24 gb…

jaxlib.xla_extension.XlaRuntimeError: INVALID_ARGUMENT: executable is built for device CUDA:0 of type β€œTesla M40 24GB”; cannot run it on device CUDA:1 of type β€œNVIDIA GeForce GTX 1050 Ti”: while running replica 1 and partition 0 of a replicated computation (other replicas may have failed as well).

add the gpu visability limit to the command line. β€œCUDA_VISIBLE_DEVICES=0” where the tesla is device 0

CUDA_VISIBLE_DEVICES=0 python app.py --port 8080 --model_version mini --save_to_disk true --img_format jpeg --output_dir generations

1 Like

@thom.watk That worked for me.

short version of what I did, note: i disabled cache to avoid anything odd I may have tried.

pip uninstall jax jaxlib -y
pip install --upgrade "jax[cuda]" --no-cache-dir
pip install --upgrade pip --no-cache-dir
pip install --no-cache-dir --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Resulting versions:

pip show jax
Name: jax
Version: 0.3.14
Summary: Differentiate, compile, and transform Numpy code.
Home-page: https://github.com/google/jax
Author: JAX team
Author-email: [email protected]
License: Apache-2.0
Location: /home/brian/.local/lib/python3.10/site-packages
Requires: absl-py, etils, numpy, opt-einsum, scipy, typing-extensions
Required-by: chex, dalle-mini, flax, optax, vqgan-jax

pip show jaxlib
Name: jaxlib
Version: 0.3.14+cuda11.cudnn82
Summary: XLA library for JAX
Home-page: https://github.com/google/jax
Author: JAX team
Author-email: [email protected]
License: Apache-2.0
Location: /home/brian/.local/lib/python3.10/site-packages
Requires: absl-py, flatbuffers, numpy, scipy
Required-by: chex, optax

FYI:

@wendell, you mentioned there not being htop for GPUs in the video, have you tried nvtop?

Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN [email protected] RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 255MHz  MEM 405MHz  TEMP  30Β°C FAN   0% POW  36 / 375 W
 GPU[                                 0%] MEM[||||||||||||||||||12.237Gi/24.000Gi]
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
100β”‚                                                                                β”‚
 75β”‚                                                                                β”‚
   β”‚                                                                                β”‚
 50β”‚                                                                                β”‚
   │────────────────────────────────────┬───┬───────────┬───┬───────────────────────│
 25β”‚                                    β”‚   β”‚       β”Œβ”€β”€β”€β”˜   β”‚   β”Œβ”€β”€β”€β”               β”‚
  0│────────────────────────────────────┴───┴───────┴───────┴───┴───┴───────────────│
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    PID  USER DEV    TYPE  GPU        GPU MEM    CPU  HOST MEM Command
   2218 brian   0 Compute   0%  11643MiB  47%     0%  32123MiB python3 app.py --port
   1716  root   0 Graphic   0%    269MiB   1%     1%     63MiB /usr/lib/Xorg vt2 -dis
   3160 brian   0 Graphic   0%    160MiB   1%     1%    476MiB /usr/lib/firefox/firef
   1771 brian   0 Graphic   3%     54MiB   0%     3%    367MiB /usr/bin/gnome-shell
   2127 brian   0 Graphic   0%     12MiB   0%     0%     74MiB alacritty
F2Setup   F6Sort    F9Kill    F10Quit    F12Save Config

thanks.

3 Likes

Last little annoying thing to fix. you can only get at most ten images… ick, dump 100 images instead? please? Here’s the fix

File …/interface/src/app.js

line 80:
const imagesPerQueryOptions = 10 => const imagesPerQueryOptions = 100

and then restart npm.

Would this work on an AMD Vega 56 or is this strictly CUDA?

Hi :slight_smile:
I am a bit confuse about this… no dataset, no training, where does it pull it’s neural network ?
I have a large set of picture i would like to train a dall-e like on, and i’ve seen some project that are getting close to the new dall-e but require training.

The issue is that i have no idea how to prepare my list of picture to create a training ready dataset…
I’ve never done any of this :sweat_smile: any good pointer ?