7gpu arch Linux render workstation

Got the big desk built with Asus sage threadripper pro, 256gb ram, 7xRadeon5700 pro, all water cooled 4radiators, 18 fans , VGA monitor on inside case, and 55 inch oled screen.

It works for most part. My goal was to use the 7 GPU in blender for rendering. I had issues with windows, so tried Arch Linux, btrfs, with time shift so I can debug.

Wayland KDE kept crashing, so I switched to x11. Seems to hang at sddm after log in. Some times before sddm, sometimes after desktop loads.

I was able to get hip enabled in blender and all 7 GPUs render in cycles. But this also sometimes crashes.

Ive tried hybrid graphics solutions… No joy. Any help would be appreciated.

2 Likes

Pic

2 Likes

Perhaps try this: unplug the power to 6 gpus. start with one only. once proven stable, add another one. And repeat.

Let us know at what point (i.e. # of gpus) that you run into crashes.

1 Like

I did that first, as I had to get below 6 to install and driver on Windows. On arch it booted up immediately into Wayland on KDE. It wasn’t until I tried to use the rendering in blender that xwayland crashed. After I switched to x11 the hang on log in became a problem. i like wayland but i need the rendering power. the whole purpose of the build. x11 allows the rendering with all 7 gpu, ocassional blender crash but functional if i can get to the desktop. i noticed the green leds on the gpus turning on and off randomly on the gpus that are not running the monitor. this is when its froze up.
i tried hybrid graphics with optimus but it wouldnt boot with optimus nstalled.

CSM disabled, above 4g decoding, rebar on?

What size is your PSU(s)

Do you have all the auxiliary power plugs for the motherboard populated?

Have you tried running every single thing at pci-e 3.0

Last resort try enabling 8x8 bifurcation on every single GPU so that they only run at 8x

Funky things happen to epyc/thread ripper if you max out the lanes

3 Likes

PSU 1600 Watt, 4g enabled, all power plugs populated, not sure about csm or bar. will go in to bios and check.

it wouldnt be a fan sensor issue would it? these cards are not usualy water cooled. but i followed igorslab and with minor mods to acrylic they all fit perfect.

If you have 4g enabled then likely CSM is disabled

Try out the bifurcation settings or change the lane speed

It seemed to work on x8 settings for the 6 rendering cards. I left the monitor card at x16. Still testing, these wrx80 boards take forever to reboot… “Server grade”.

Got to figure out how to enable the VGA screen in x11 it worked on Wayland instant. I’m using that screen to monitor system temps…

It lost a little performance on render, but that wasn’t much. I just need it stable now.

2 Likes

8x no go?

Worked for 1 render then the little green lights on the GPUs began randomly flashing and screen froze. Trying again, maybe blender crashed it

Wondering if I need a fake monitor hooked up to the GPUs to stop them turning on off searching

It does that during sddm login and sometimes during first few seconds in desktop. If I can get past that it works good until I reactivate the cards in blender.

I think it is an issue with cards searching for monitors and not deactivate or enable when needed. This should be a software setting, but a workaround is using dumy displays. I’m gonna try that.

On windows using the Radeon pro driver software you can use edid emulator built in
But Linux AMD drivers… Leave much to be desired…

I’m trying to finally get off Windows. I’ve been playing with Linux on and off for years.
Worse case i will have to run Windows…

I’m waiting on dummy plugs in mail to try that. It really seems to be a GPU activation issue.

I think your PSU might be a little too weak with all GPUs under full load. Under full load your 7 GPUs alone draw around 190W each so that’s 1330W of your PSU already used. The remaining 270W might not be enough for the rest of your system.

2 Likes

@tiwaz_bleddyn install ROCm from the AUR

Then rocm-smi adjust you max watts for all devices to say 130w for now and test if it’s over loading

If you need help with ROCm-smi let me know

2 Likes

by saying it hangs before sddm are you saying the system is hanging before xorg?

If blender supports CLI rendering, you could just scrap the desktop environment for the render jobs.

There may be a way to get xorg to ignore all but one of the GPUs to get some stability back for the desktop

I will try that , it works reliable on radeon pro render but not the cycles render with hip… I will look at the wattage… I use to mine crypto with other equipment, so familiar with tweaking. Never done it on rocm.