The Zen of mGPU

Thank you for the further thoughts.

My take away here is 5090x2 > 4090x2 > 5090 > 5080x2.

I got wind of a possible restock tomorrow at Best Buy. I am going to do my very best to snag a 5090. If i can manage that, i will need to sell off these 5080s to fund that. But if that does not pan out, I will look at trading for a pair of 4090s. I dont think a straight trade for a used 4090 would be a fair trade for myself, so ill look to see if cash on top would be something someone would be interested.

Time to scour facebook marketplace.

1 Like

As I navigate the issues with the 50xx series cards, architecture, build issues, supply issues, missing ROPs, heat issues, connector issues, driver issues such as monitor timings, chromatic aberrations, and black screens… I also consider that maybe the supply issues and price gauging may have done me a favor.

In all the product releases from Nvidia, the tactics and issues this time around… seem unheard of for what is supposed to be one of the shiniest companies.

In retrospect, from product release strategies, AB partner strategies… while I am a fan, I really fell let down and somewhat fooled, making me feel like a sheeple.

As of right now, my 4090 single card benchmarks, without DLSS, are higher than the 5090 single card benchmarks without DLSS. So I am also worried that the hype is geared towards DLSS, which I would not use. Lastly, my dual GPU config on many titles, without DLSS, is from 90-130% better benchmarks than the single 5090 benchmarks on several titles.

yes… I am (was) attracted to what I could do with a pair of 5090s… but there are now so many caveats.

So… maybe this botched 5090 release is a favor, and maybe I’ll skip this gen and or wait for the later cycle update/releases.

If this was a “Dear Abby” it would be signed:

“Jilted”

For those trying mGPU as well, here are my observations on the latest Nvidia drivers:

Observations:
Under these conditions -
Dual 4090 and dual 3090 Ti (two seperate systems)

4k, 10 bit, HDMI 2.1, max Ray Tracing settings, no DLSS, no FG, no film grain, no vignette, no DoF, no motion blur, max all other settings, no resolution reduction
Basically maximum visuals without resolution reduction gimmicks

This 572.70 driver in the series of 572 driver, is 4-9% performance drop compared to 571 driver and 3-5% drop compared to 566 series drivers.

Additionally, the 572 series has introduced a lot of stuttering, some “hang times” on loading, some aberrations in certain scenario that involve path tracing. I do not experience the black screen on the hdmi 2.1 4090 (x2) system, but I do experience it on the dual 3090 Ti system that uses DP 1.4 (120hz monitor)

I am surprised that the 571.96 driver is slightly better than the 566, and that the 57196 has not demonstrated so many of the issues of the 572

The 572.70 seems to have lower performance than the 572.24 hotfix but it can seem marginal when the performance drop is 1-2%.

As for the reports that this driver fixed the black screen issue, I find that is NOT the case on my DP 1.4 3090 Ti system, and the issue does manifest indeed, so NOT fixed.

All this is of course on the heels of many choices and issues on DLSS4 and FG, and the 5090 series push, but in my 4090 world where I don’t use DLSS and FG resolution reduction, I experience performance degradation in RAW performance on the 572 series.

My observations so far show that the ranking is:

  1. 522 is top performer (in the raw, 4k max visuals, No DLSS and No FG)
  2. 571.96 is 2nd (in the raw, 4k max visuals, No DLSS and No FG)
  3. 566.45 is 3rd (in the raw, 4k max visuals, No DLSS and No FG)

Worst performance (in the raw, 4k max visuals, No DLSS and No FG) is the 572 series for 4090 and 3090Ti
This driver (572 series) also has video related anomalies, (the 530 series had black boxes/artifacts that would manifest on the 2d desktop, documented)

Respectfully

moved to dual 5090 FE RTX

Chernobylite has a built in benchmark that displays all the parameters used on the live result screen, it does not use DLSS so this is performance in the raw, max Ray Tracing and max visuals at 4K.

Dual GPU

2 Likes

Disappointed in gaming performance

In the raw, no DLSS, these drivers are providing the 5090 with only a low 9% to 18% increase in raw performance over the 4090.

The increase in performance from the 3090 and the 4090 was significant.

For algorithms and CNN the increase is about 24%-29% due to grater memory and some other changes. So for that purpose its was somewhat in line with expectation.

For the gamers out there, in the raw, no DLSS, a low 9%-18% increase is one of the lowest increases in performance I’ve seen.

Yes, I have set some new high scores, yes it is faster, yes, there some new gimmicks available, but such a small margin in the raw performance.

I’m going to try and see if some of the prior driver releases and see if its just this driver or the overall product.

(….add in the price increases, the miserable launch, the early build quality issues etc.)

As just one example, you can see the small jump in performance here (no DLSS):

and here:

The link / examples given are not the only representation, but the savvy observer, the incremental increases in performance across drivers and hardware can easily be seen and tracked. Not just from my scoring, but from other contributors.

Hopefully it just this particular driver. Anyone suggest what the best performing 5090 driver is? I’m new to the 5090 scene so I’m starting with the latest driver

But so far, the 5090 is not the leap the 4090 was over the 3090

IMHO

Have you checked the Pci-express bus traffic on the RTX 5090’s compared to the RTX 4090’s seems like the 4090’s might have some room to breathe on the Bus maybe the 5090 is saturating it?

1 Like

Thank you for this thread, in a couple days or at least the next two weeks I will be able to try this out myself since I will be having two GPUs available for the first time in like 15 years, which will be exciting.

2 Likes

Welcome to the forum @Darkman666 :grin:

1 Like

The motherboard is PCIe 5.0, and the internal bios settings are also associated with the CPU CPU interconnections, allowing a lot more room. In general even under a heavy algorithm load, I can’t hit a limitation on the bus. I have had 4x 4090s on this board and did not hit a limit, in that regard, but I appreciate you exploring the possibilities.

Its been several years (3-4) since the 4090 came out (designed more than 5 years ago) so at a core level, it seems the architecture may not have been a leap compared to 3090 vs 4090 as nvidia has shifted its focus in the past 4 years.

if I unbiasedly, look at all the 5090 information, including all the issues: it seems that nvidia’s performance increase mantra this time around was focused on DLSS gimmicks, and less so actual raw performance.

While I can use the 5090s (2x) 32GB of ram in algorithms, I definitely have not had the WOW factor the 4090 provided.

look forward to it!

upgraded to 2800W psu


3 Likes

Regarding these UE5 / UE4 settings, what are you doing to get unreal engine games to work with mGPU? I just got a second 5090 FE and am wanting to mess around with this stuff. Alternatively I want to know how I can setup unreal engine for my own personal game development so I can make a mGPU tech demo. I think if I can get a tech demo going it could give the mGPU subject area more traction.

Hello,
unfortunately I do not know enough about creating in the UE5/UE4 editor/creator. I can only observe the things that have worked for some games/apps and not worked with others.

there are layers to the set up that I can observe:

DX12 vs VK
SLI vs mGPU
Hetereogenous vs Homogemous
AFR SFR mGPU

so in the editor:
you’d be DX12, heterogenous (not SLI) homogenous (not SLI)

So some of the parameters work on some systems and inconsistent on others.

Example UE4 Chernobylite. The devs must have went down the path and then abandoned it, because it gets 85% efficiency on DX12 mGPU (if you consider 1 card to be a mythical 100%)

UE5 Matrix City Demo, same thing, as above about 70% efficiency.

MultiGPU is only available for Offline Path Tracing mode, according to Unreal documentation

One of my projects (when I have time) is to learn about the engine so that I can refine what I know

As far as your pursuit on this, a fellow on techpowerup made his own UE5 benchmark. Early on it worked in mGPU, and then a revision later it stopped working in mGPU. So maybe there are several levers one has to pull on

I ended up getting Chernobylite today because it was on sale for the weekend (good timing) Can you outline how you managed to get that particular game working, you mentioned it being harder but I’m more than willing to mess around with it. It looks like I could actually benefit greatly from getting my second 5090 to help rendering this game as with just a single 5090 and settings cranked at 5120x1440 resolution I only get around 70FPS in the benchmark. Getting an extra 85% boost in FPS from putting my second 5090 to use would be really nice.

Btw my setup is Dual 5090’s, I have been able to get GravityMark to work and you can actually see some of my benchmarks below yours in the leaderboard.

My other specs (in case it helps):
Ryzen 9 9950X3D
96GB DDR5 6000MHz
Windows 11 Enterprise 24H2
NVIDIA Game Ready Driver 576.88

1 Like

Also would you mind making a link or something for v0.987 of the GameTech PC benchmark? The older versions don’t exist online anymore.

Sir
Unfortunately I don’t have the link to the 987 and I don’t have the download anymore.

When I get back to my house (we have been gone for a few days) I’ll send what I have for chernobylite config. I use the GOG version of it so that nothing else needs to be running to play. I have a few settings in a cell phone shot so I’ll include those for now.

You have a nice score in gravitymark by the way. They (tellusim) are going to fix the dual 5090 dx12 problem in the upcoming vs 190 release of gravitymark

To answer your original question here are some of the settings you can try:

some of the things I used:, but from memory and from which attempt…:

[/Script/Engine.GarbageCollectionSettings]
gc.MultithreadedDestructionEnabled=1

[/Script/Engine.Engine]
bAllowMultiThreadedShaderCompile=True

[SystemSettings]
r.Streaming.MipBias=-1
r.Streaming.LimitPoolSizeToVRAM=0
r.Streaming.UseAsyncRequestsForDDC=1
r.Streaming.FullyLoadUsedTextures=1
r.Streaming.DefragDynamicBounds=1
r.Streaming.HLODStrategy=2
r.ParallelRendering=1
r.ParallelShadows=1
r.ParallelTranslucency=1
r.TextureStreaming=1
r.PoolSizeVRAMPercentage=95
r.ParallelRendering=1
r.ShaderPipelineCache.Enabled=1
r.UseShaderCaching=1
r.GPUBusyWait=0
r.GPUParticle.AFRReinject=1
r.AllowMultiGPUInEditor=1
r.EnableMultiGPUForkAndJoin=1

D3D12.AFRSyncTemporalResources=1
D3D12.AFRUseFramePacing=1
InGamePerformanceTracking.HistorySize=0

for reasons I don’t know, anistrophy 16x seems to keep going away in some settings of some games evert since the 5090s, so I manually include it and mark file read only:

r.MaxAnisotropy=16

in the game shortcut I used (point to your directory):

“D:\Chernobylite\ChernobylGame\Binaries\Win64\ChernobylGame-Win64-Shipping.exe -DX12 -NoWrite -MaxGPUCount=2”
only works on the actual exe in the dx12 directory of the game

I had other things to overcome in my dual GPU set up since I am on a dual epyc MZ73-LM0 board.

I had to set xGMI to no encryption ( (not auto)
width control to 16x (not auto)
fixed 32GBs 16x (4x) cpu to cpu to pcie (not auto)
4 link xGMI max speed 32Gbps (not auto)
3 link xGMI max speed 32gbps (not auto)
disable SMEE (not auto)
transparent encryption disabled (not auto)
fabric state 1, mode 0
SEV control disabled

in windows, disable mitigations (override mask 3)
HAGS set on
ReBAR enable on global profile, rebar size set to unlimited in profile inspector
set rebar to 32 (unlimited) on bios/OS

THEN
I played with the inspector coolbits and settings in inspect (v xxxx xxx .29)
and in the nvidia game profile set power to Maximum (not the default)
MUST also disable shader cache

If you are in one of the new games (UE5) that is using Lumen, (I’ve seen it get corrupted) then include for visuals:

r.ReflectionEnvironment=1
r.Lumen.Reflections=1
r.Lumen.ScreenProbeGather.ScreenTraces=1
r.Lumen.Reflections.HighQuality=1
r.Lumen.Reflections.HierarchicalScreenTraces=1
r.Lumen.GlobalIllumination=1
r.Lumen.DiffuseIndirect=1
r.Lumen.Reflections.HardwareRayTracing=1
r.Lumen.ScreenProbeGather.ScreenTraces=1
r.Lumen.ScreenProbeGather.RayTracing=1
r.Lumen.SkyLighting=1
r.Lumen.TranslucencyReflections=1
r.LumenScene.SDFTraceDistance=20000
~and make the file read only or it gets overwritten and visual corruptions may comeback

LASTLY, and MOST important:
nothing gets saved or works the 2nd time around (sometimes even the first time) of your load has the Microsoft Store enabled and not blocked, for reasons that are beyond my expertise, the MS store breaks some of these settings. In my case I have removed the MS Store / uninstalled and blocked use. I had trouble getting this to stay on Win 11, its one of the reasons I’m on server (as a workstation). Example, Civilization 6, would revert from mGPU and the menu selection for mGPU would be grayed out when MS store ran. Weird especially since my Civilization 6 is a standalone ~bought directly (not steam or MS store).

As I don’t have any MS products on my PC, other than the OS, and I don’t buy or use anything from the ms store, not having MS store has no cons in my world, but for others this may pose a compromise.

I know this list isn’t much but I
hope that helps

1 Like

Thank you for the info! I will mess around with it tomorrow and see if I can replicate your configuration.

Also I have good news regarding Unreal Engine after messing around with it today I was able to figure out real-time multi-process rendering (on the latest version 5.6.0 too)! Still have to do some tests to make sure this is a real FPS gain but these screenshots I got look very promising.
Single GPU configuration:


Dual GPU configuration:

If anybody is interested in how I did this I will document it soon but if you want to immediately start working with real time multi-process rendering I can give you some rough information now.

2 Likes

Quick update on the unreal engine real time multi-process rendering. The configuration I had running was misleading! After doing some further tests a single GPU was actually getting about 20-40FPS higher than the Dual GPU Config I had setup in my previous screenshots, and the single GPU Screenshot was simply misconfigured. On the bright side the GPU’s ARE being used by the program which is a good starting place, now the issue is just figuring out how to use them efficiently so that they outperform a single GPU.

2 Likes

I made a proper control this time and upped the resolution, I am using Unreal Engine’s nDisplay / Switchboard to accomplish this. My Control matches what I get in the editor at the same viewpoint in frames per second so that means it’s working. Both are rendering at 4800 x 1350. This time I used split frame rendering and got a nice 40FPS boost, although I’m sure I can get more out of the second GPU than just this, still have to mess with settings more though.

Control Image:
[

]

Split Frame Rendering (2x GPU Utilization)
[

]

As you can see split frame rendering did not play nice with the material I applied to the Suzanne head but given I can use a different material I don’t think this is too much of a issue if split frame rendering ends up being the only way to use dual non-SLI GPU’s in modern Unreal Engine. On the bright side there did not seem to be any tearing from the rotation of the head but I will have to investigate this further too, most likely with some high speed objects moving past the camera or attaching the camera to a moveable player.

2 Likes

Fantastic work! Less than 24 hours later look at what you accomplished!

In inspector it’s always challenging to get the best match with SFR to get optimal FPS and avoid visual anomalies

Great work!!

1 Like