Ubuntu 22.04 - From Zero to 70b Llama (with BOTH Nvidia and AMD 7xxx series GPUs)

that’s awesome!! what kinda perf you seeing? tokens/sec on mistral 70b for example

Would be good to get TFlop numbers for flash-attention

I have almost exactly the same setup as the video waiting for me to replace my home server hardware (A2000/5700/B450D4U) with for the last few months,

RTX 4000 Ada
Ryzen 7900
ASrockRack B650D4U-2L2T motherboard

Added a 80+ titanium power supply (one with specifically high idle efficiency as this is not really included in the 80+ rating)
A Coral Dual Edge TPU for running models for Frigate
A Crucial T705 4TB PCIe 5 SSD (only one m.2 slot on the board)
And a 1.5TB Optane driver for the write heavy stuff

Poor thing is gathering dust waiting for me to find time :frowning:

2 Likes

i have around eval rate: 6.21 tokens/s with my RTX a4500 with an wizard-vicuna-uncensored:30b Model running on Unraid. Is that good or not? :smiley: i mean it feels runnable.

that’s around the speed you get on cpu only/7900 +/-

1 Like

hm ok i tested with llama2:13b-chat-q6_K --verbose that went fully into VRam and i have about 34.38 tokens/s, so the bigger the model and it swapps to my DDR4 the more it uses CPU, i see.

iam downloading some 30b in the sub 20GB range and test them.

It looks like that for me, with only a RTX A2000 6 GB to play with, that’s nowhere CLOSE to having enough VRAM/powerful enough to be able to play with the 70b model.

:frowning:

Bummer.

edit
In the latest video, @wendell mentioned about using a WebUI for Automatic 1111 (GitHub - AbdBarho/stable-diffusion-webui-docker: Easy Docker setup for Stable Diffusion with user-friendly UI)

but the error message that I am getting is:

ubuntu@nvidia-ai:~/stable-diffusion-webui-docker$ sudo docker compose --profile download up --build
[sudo] password for ubuntu:
WARN[0000] /home/ubuntu/stable-diffusion-webui-docker/docker-compose.yml: `version` is obsolete
[+] Building 0.8s (6/8)                                                                                            docker:default
 => [download internal] load build definition from Dockerfile                                                                0.0s
[+] Building 0.8s (6/8)                                                                                            docker:default
 => [download internal] load build definition from Dockerfile                                                                0.0s
 => => transferring dockerfile: 185B                                                                                         0.0s
 => [download internal] load metadata for docker.io/library/bash:alpine3.19                                                  0.4s
 => [download internal] load .dockerignore                                                                                   0.0s
 => => transferring context: 2B                                                                                              0.0s
 => CACHED [download 1/4] FROM docker.io/library/bash:alpine3.19@sha256:5353512b79d2963e92a2b97d9cb52df72d32f94661aa825fcfa  0.0s
 => [download internal] load build context                                                                                   0.0s
 => => transferring context: 128B                                                                                            0.0s
 => ERROR [download 2/4] RUN apk update && apk add parallel aria2                                                            0.4s
------
 > [download 2/4] RUN apk update && apk add parallel aria2:
0.248 runc run failed: unable to start container process: error during container init: unable to apply apparmor profile: apparmor failed to apply profile: write /proc/self/attr/apparmor/exec: no such file or directory
------
failed to solve: process "/bin/sh -c apk update && apk add parallel aria2" did not complete successfully: exit code: 1

Apparmor, in my Ubuntu 22.04 LTS privileged LXC container is already set to unconfined in my <<CTID>>.conf in Proxmox 7.4-17.

My RTX A2000 6 GB has been successfully passed through to the LXC container and I’ve also got the Nvidia Container Toolkit installed successfully and the sample workload of running nvidia-smi also ran successfully as well.

I used LM Studio and just picked the model that had the most parameters that my GPU was capable of running.

I have a 9700TX 20GB other than hanging sometimes at the start its quick.

it glitches out pretty bad sometimes and just starts putting out trash but otherwise its for sure chatting away.

1 Like