Issues with PROXMOX video drivers

HaaStyleCat · September 8, 2020, 6:40pm

I have finally successfully set up my proxmox server using the default ZFS and samba to create my networking NAS following peanuts guide.

The issue I am having now is with drivers. No matter what I do or whos guide I follow I get this. It really confuses me. Says “440.100” version of nvidia driver is installed, then it says my card is not supported by version “418.152.00” (see posted text)

# root@HSHL:~# apt install -t buster-backports nvidia-driver
Reading package lists... Done
Building dependency tree
Reading state information... Done
***nvidia-driver is already the newest version (440.100-1~bpo10+1).***
0 upgraded, 0 newly installed, 0 to remove and 67 not upgraded.

# root@HSHL:~# nvidia-detect
Detected NVIDIA GPUs:
28:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2187] (rev a1)

Checking card:  NVIDIA Corporation Device 2187 (rev a1)
***Uh oh. Your card is not supported by any driver version up to 418.152.00.***
A newer driver may add support for your card.
Newer driver releases may be available in backports, unstable or experimental.

I can not figure out how to delete or remove these versions and do a manual install of the newest version because it says “mostlikely” nouveu is conflicting but I have done all I can to disable it.

Anyone think they can help? I think I’m going to try a clean install again. I’m on day 3 of trying to get this to work so I can perform a GPU passthough. I’ve crashed my promox install 3 times, and had unrecoverable data on my ZFS and errors with content and size. I also reinstalled an additional 5 times because i was worried changes I made and documents I created were conflicting.
I would appreciate any help.

Side note my board has a GPU as well. Is this in conflict? Or can I pass it though or should I keep it for proxmox only?

If I install the proxmox host on a ZFS can I back it up working before I mess with everything to restore from? So I can keep messing with it but also setting it up?

Heres the hardware info OS etc.

Proxmox 6.2.1
cpu- Ryzen 7 2700
cooler- modified MSI Frozr L (modified with noctua mount-post later?)
ram- Kingston KSM26ED8/16ME Server Premier unbuffered ECC 2666
mobo- ASRock Rack X470D4U
LSI SAS 9211-8I
GPU- MSI GTX 1650 Super Gaming X (I will need code to unlock trans-code limit?)
Drives- Samsung 970 Evo 500GB M.2, Samsung 860 Evo 1TB M.2 SATA, SATA 2.5 version same size, Seagate Firecuda 2.5in 2TB SSHD, 4 6TB Ironwolf Drives, and WD Blue 6TB 3.5in.

All housed in a Fractal Node 804 Case with some Noctua Fans.

nx2l · September 8, 2020, 7:13pm

Isnt the point of proxmox to use it headless?

HaaStyleCat · September 8, 2020, 7:17pm

I was under the impression I needed the correct drivers on proxmox to be able to pass-through the GPU properly to a VM or LXC. Is that not needed and I’ve been chasing my tail? I was hoping to use onboard graphics with a Linux distro if possible.

1650 super is just for plex. Maybe I need a older card…

nx2l · September 8, 2020, 7:38pm

oh
idk
I wont be of any help there

HaaStyleCat · September 8, 2020, 7:42pm

Thats fine, but thank you. Some of your thread was helpful as well. I enjoyed reading it.
I dont know I have a Ryzen 2700, I dont know if I even need the pass through for transcoding, almost everything is direct streamed any ways and I’m sure it can handle encoding DVR recordings, this is kind of an experiment. Its running headless now via proxmox interface via putty or web interface.

Dynamic_Gravity · September 8, 2020, 7:45pm

you don’t need nvidia drivers at all to do passthrough.

Drivers are what is needed for the target system (guest) that will accept the card not your host.

It depends on the source honestly. But the only way to do hardware transcoding is with a discrete gpu.

HaaStyleCat · September 8, 2020, 7:54pm

Gotcha, thank you. Im going to try that and see if I can make it work. Know any good guides that arent 5 years old plus or one you might have used when you started?

TheCakeIsNaOH · September 8, 2020, 8:36pm

Their wiki page is not bad.
https://pve.proxmox.com/wiki/Pci_passthrough

HaaStyleCat · September 8, 2020, 8:37pm

Ill check it out

HaaStyleCat · September 8, 2020, 11:24pm

Well I got it working! I ended up using this guide : https://www.passbe.com/2020/02/19/gpu-nvidia-passthrough-on-proxmox-lxc-container/

The only thing that was making my instance crash on the local proxmox install was the section-

Create /etc/udev/rules.d/70-nvidia.rules and populate with:

Create /nvidia0, /dev/nvidia1 … and /nvidiactl when nvidia module is loaded

KERNEL==“nvidia”, RUN+="/bin/bash -c ‘/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*’"

Create the CUDA node when nvidia_uvm CUDA module is loaded

KERNEL==“nvidia_uvm”, RUN+="/bin/bash -c ‘/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*’"

After by passing this step all together I was able to do the install no problem to a container. After getting the 440.100 version installed on proxmox I was able to get the Nvidia ID#'s I needed for the pass through, then just installed that version to match on the debian LXE container. Viola… Hardware accelerated goodness for my LXE container running plex.

This guide (link below) helped with the install of plex in the container as there was a need for a ssh tunnel to initially access the server. I also had to create the user as a sudo account on the LXE container to create a password to complete the tunnel. (proud I figured that out on my own) Once it was configured you can access it as normal. It also picks up my HDHomeRun no issues so Ill be swapping over libraries I hope soo and recording functions to see how it performs.

https://www.linuxbabe.com/debian/install-plex-media-server-debian-10-buster#:~:text=If%20you%20are%20going%20to,from%20command%20line%20like%20below.&text=Once%20the%20Plex%20deb%20package,cd%20to%20the%20download%20directory.

I appreciate the help guys. Hopefully some of this info will help others. I need a nap…lol

HaaStyleCat · September 15, 2020, 4:03pm

well, changed up the server. Followed steps again…and now failure day 3. I’m logging all my steps now and trying to revert or reinstall proxmox each time to see what really works. So far no joy…day2…day3… each dot is a restart and re-config. I can not for the life of me get nvidia drivers working in proxmox… I have two gpus, well three with board integrated gpu. Can this be done without this painful step? Can I edit the LXE config for the container to add in device ID or one of the many other identifiers I have from narrowing down the device? Anyone thats done this care to share? Good news is with all of this retrying I’m memorizing a number of useful commands lol

HaaStyleCat · September 23, 2020, 10:49pm

Hello and welcome!

I finally managed to get the LXC container to accept the GPU, I’m fairly sure its working. Not a huge load on a 1650 super to help encode source at source. I see temp jump from 38 or so to 50 when handeling dvr recordings but all 4 threads I have attached to the container get up to 60% useage on the 2700 non X CPU.

I plan on testing with a huge jump from maybe 1080p to 400’s to see how that goes…actually lemme check…yeah from 35 to 40 but I dont have a “live reading” but it is being utilized

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 1 N/A N/A 11838 C …diaserver/Plex Transcoder 257MiB |
±----------------------------------------------------------------------------+
I can see it running processes.

Below I have my “Play book” or log of how I got it to work. *warning this is in a multi-gpu system AND works for LXC containers Ubuntu 16.02 I think. I also had to manually select a driver compatible with patch to allow more than two transcodes at once. It seems to work but on a card this small I only got about 5 jobs going till it got choppy.

HaaStyleCat · September 23, 2020, 11:01pm

#remove no valid subscription notice
sed -i.backup "s/data.status !== 'Active'/false/g" /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js && systemctl restart pveproxy.service

#update sources for backports
nano /etc/apt/sources.list
add 
deb http://download.proxmox.com/debian buster pve-no-subscription

deb http://httpredir.debian.org/debian buster-backports main contrib non-free
cntl+x, y, enter

#Remove enterprise source to bypass fail message in logs
nano /etc/apt/sources.list.d/pve-enterprise.list  
-Add # at beginning of line to disable enterpise support
cntl+x, y, enter

apt update && apt dist-upgrade -y

reboot

uname -r to see current version of PVE

apt-get install pve-headers-5.4.60-1-pve

apt install build-essential

apt-get install i7z htop iotop gcc

#setup for nvidia install *Make sure versions match in all commands
mkdir /opt/nvidia
cd /opt/nvidia
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/450.57/NVIDIA-Linux-x86_64-450.57.run
chmod +x NVIDIA-Linux-x86_64-450.57.run
./NVIDIA-Linux-x86_64-450.57.run --no-questions --ui=none --disable-nouveau

Install will fail-Continue on

more /etc/modprobe.d/nvidia-installer-disable-nouveau.conf

will see-
# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0

reboot

#reinstall driver successfully
cd /opt/nvidia
./NVIDIA-Linux-x86_64-450.57.run --no-questions --ui=none --disable-nouveau

reboot

#Add modules to file
nano /etc/modules-load.d/modules.conf

# /etc/modules: kernel modules to load at boot time.
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with “#” are ignored.
nvidia
nvidia_uvm

update-initramfs -u

#update file by adding lines below
nano /etc/udev/rules.d/70-nvidia.rules

# /etc/udev/rules.d/70-nvidia.rules
# Create /nvidia0, /dev/nvidia1 … and /nvidiactl when nvidia module is loaded
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L'"
#
# Create the CUDA node when nvidia_uvm CUDA module is loaded
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u'"

#edit to nvidia persistence
apt install git
git clone https://github.com/NVIDIA/nvidia-persistenced.git
cd /nvidia-persistenced/init
./install.sh
systemctl status nvidia-persistenced
-see if ACTIVE
reboot

#steps for unlimited transcodes works with version 450.57
ls -l /dev/nvi*
root@HSHLa:~# ls -l /dev/nvi*
crw-rw-rw- 1 root root 195,   0 Sep 22 13:48 /dev/nvidia0
crw-rw-rw- 1 root root 195,   1 Sep 22 13:48 /dev/nvidia1
crw-rw-rw- 1 root root 195, 255 Sep 22 13:48 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Sep 22 13:48 /dev/nvidia-modeset
crw-rw-rw- 1 root root 236,   0 Sep 22 13:48 /dev/nvidia-uvm
crw-rw-rw- 1 root root 236,   1 Sep 22 13:48 /dev/nvidia-uvm-tools
nvidia-smi
cd /opt/nvidia
git clone https://github.com/keylase/nvidia-patch.git
cd nvidia-patch
./patch.sh

*create LXC container

nano /etc/pve/lxc/100.conf

lxc.cgroup.devices.allow: c 195:* rwm
lxc.cgroup.devices.allow: c 236:* rwm
lxc.mount.entry: /dev/nvidia1 dev/nvidia1 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
cntl+x y enter

apt update && apt dist-upgrade -y
apt-get install udev
mkdir /opt/nvidia
cd /opt/nvidia
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/450.57/NVIDIA-Linux-x86_64-450.57.run
chmod +x NVIDIA-Linux-x86_64-450.57.run
./NVIDIA-Linux-x86_64-450.57.run --no-kernel-module
nvidia-smi

root@HSHLaPlex:/# ls -l /dev/nv*
crw-rw-rw- 1 nobody nogroup 195, 254 Sep 22 20:48 /dev/nvidia-modeset
crw-rw-rw- 1 nobody nogroup 236,   0 Sep 22 20:48 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 236,   1 Sep 22 20:48 /dev/nvidia-uvm-tools
crw-rw-rw- 1 nobody nogroup 195,   1 Sep 22 20:48 /dev/nvidia1
crw-rw-rw- 1 nobody nogroup 195, 255 Sep 22 20:48 /dev/nvidiactl

And it should be working!

Version 450.57 is the newest version working with the patch. If you dont need it can do newest for linux at 460.?

Any way hope it helps people. Im sure my posting is messy I’ve only been messing around a month or so.

Also, props to these posts