The Zen of mGPU

As in MULTi-GPU and NOT SLI or Crossfire.

Hopefully there are other enthusiasts here and we can exchange findings, tips, and tricks

I’ll start off with some comments:

Heterogenous mGPU would be games such as mGPU titles such as Ashes of the singularity, Strange Brigade, Civilization 6, Zombie Army 4, Total War I and II, Gears of War, the firts release of Quake II RTX , and such benchmarks as Tellusim’s GravityMark

Again, we are not discussing SLI or Crossfire

Example, there is no SLI here and not using NVLINK

I’ve been testing some UE5 titles and the findings are consistent. UE5 titles can also apply mGPU to path tracing as well with engine 5.1, 5.2, 5.3 and 5.4

Other titles are Hellblade 2, Silent Hill 2, and the Talos Principle 2, as just a few examples.

I have also found that some UE4 titles allow for mGPU

For the Matrix Awakens City Demo UE5, 5.2 and 5.3, I almost scale to double the performance using the second card , however moving to same demo in UE5 5.4 the performance increase for the second video card is only 45%.

Again, everything is et to max textures, max eye candy, max ray tracing, etc and all at least at 4K without using DLSS or FG

so… milage may vary with development.

some tips and tricks include:

-d3d12 -MaxGPUCount=2
r.PathTracing.GPUCount 2
-game -d3d12 -MaxGPUCount=2
D3D12.AFRSyncTemporalResources=1
D3D12.AFRUseFramePacing=1
r.GPUParticle.AFRReinject=1
r.AllowMultiGPUInEditor=1
r.EnableMultiGPUForkAndJoin=1
r.PathTracing.GPUCount 4

also. it depends on the kind of developer, savviness, what was scrapped before release, what was made into it but not documented, etc. etc etc

Please optimize and enable Resizable BAR to “unlimited” and if you are using nVidia drivers you may want to turn OFF shader cache.

Allow me to qualify the observation:

This is done at MAX eye candy in 4K or 8K.
in situations of “high demand” and “higher resolution” with MAX ray tracing, max eye candy, but when NOT using visual gimmicks such as DLSS (mostly resolution reduction) or FG
This also observed in situations where higher rez textures are used (4k, 8k, etc) with or accompanied with max eye candy

max eye candy, to me, means: No DLSS, No FG, No Vignetting, no noise, no Film Grain, 4k (or above) and using 120hs + monitor and HDMI 2.1 or DP 1.4

4K or 8K and no resolution reduction gimmicks… you heard me… gimmicks! Why buy an expensive monitor just to run at half the resolution?! DLSS, FG. and FSR are just lower resolutions offered so that you can buy an expensive video card that can’t run the game at 4k with max eye candy and Ray Tracing set to ULTRA or path tracing set to Ultra, EPIC or Cinematic.

That is when, to get4k MAX eye candy, NO DLSS, NO FG, and NO FSR requires…a second identical video card (if you can get the title to work with it)

definitely not as simple as it sounds. It requires research, and a degree of dark arts and some Voodoo (yes that was a pun in reference to the SLI original cards… ok, I’ll stop)

Lets use an example that allows for disclosing the settings used, so that folks can see the end result:

I will use Chernobylite a UE4 title as a representative as the benchmark summary screen actually lists the setting used in the benchmark (wish more titles did this)
Again, max eye candy, to me, means: No DLSS, No FG, No Vignetting, no noise, no Film Grain, 4k (or above) and using 120hs + monitor and HDMI 2.1 or DP 1.4

1 GPU Chernobylite MAX settings all with Raytracing set to ultra, 4K, NO DLSS and NO FG

2GPU Chernobylite MAX settings all with Raytracing set to ultra, 4K, NO DLSS and NO FG

In this example, FPS went from an average of 127 FPS to an average of 228 FPS
.
.

228 FPS with Ray Tracing set to ULTRA and no DLSS ! This why one should understand mGPU.
.

So if you are interested, let us contribute. We can get down to the minute details of how and exchange ideas.

If you have never done Multi GPU (non sli or crossfire) start off with identical cards and try GravityMark and or Ashes of the Singularity

.
For some apps or games setting up mGPU is available through the interface complete with choices and selections, ithers is just a check mark and for some others its only possible with xml, ini, and cfg changes as well as launch parameters,

Gravity Mark interface:

Ashes of the Singularity interface:

Civilization IV

.

So while this a gaming discussion, this particular PC build was actually designed for medical imaging and CNN / RCNN, but since its my personal home PC it does do other things, including gaming.

I actively participate on Ai / algorithm groups and forums, and that has its own common language. Here, that would be non sequitur, so while gaming is not its primary focus, in this forum gaming is more of a common language.

So for gaming, I’m more of a non-steam, not a rent to play kinda guy. I like to own my games (and music, and videos etc). So I have a lot of boxes/classics. I like clicking on one exe without additional clients, telemetry, and forced updates etc.

If I must have a recent title and I can’t get it as a “standalone” then I may buy the game, but play using a crack.

.

.
My PC - (primary test system) The Zen of Air Cooling

2x 4090 RTX Founders Edition
2x AMD Epyc 9684X (192/384 cores) 2.2GB of cache , Gigabyte MZ73-LM0 Dual socket motherboard, PCIE 5.0
1.5 TB RAM DDR5 ECC LRDIMMs 4800mhz 2050W digital power supply
4x Micron 9300 Max (15.4TB each) Sabrent Rocket 4 Plus (8TB). 2x Back up drives Micron 5300
Asus PA32UCG-K monitor, MS Data Center 2022 & Ubuntu, and MS Data Center 2025 (preview)

Second test system:

2x 3090 Ti Founders Edition (with NvLink bridge)
2x AMD Epyc 7773X (128/256 cores) 1.5GB cache of cache, Gigabyte MZ72-HB0 dual Socket motherboard
1.0 TB RAM DDR4 ECC LRDIMMS, 2050w power supply
5x Micron 9300 Max (15.4TB each 64TB RAID, Overprovisioned) Sabrent Rocket 4 Plus (8TB). 2x Back up drives Micron 5300
Asus PA32UCG-K monitor, MS Data Center 2022 & Ubuntu, and MS Data Center 2025 (preview)

7 Likes

Example, I need to optimize this, and as I am just getting started, I’ll work on this one here and there and see what the original author can also optimize. The author in another forum created his own path tracing UE5 bench. A worthy feat.

1 Like

Lastly, depending on complexity, we can look at several popular algorithms and components that can make use of multiple GPUs. —> but the core of this thread is mGPU, —> I had considered a separate thread to address NLP,LLM,CNN etc etc but some members in this enthusiast community already have put effort into that and it seemed like stealing the endeavor of others to start another thread across this forum

I want to see more of this. You guys are awesome.

…could one do this with a haswell iGPU? I mean I doubt it would offload much, but I’ve enjoyed how quicksync takes such little CPU when streaming. I like the idea of offloading background “low priority” OS or texture tasks on the iGPU

You could link their tutorials/info in a list of preview boxes

Hi,

in doing mGPU it is easier to implement with identical GPUs. You can off load some functions in some apps such as Gravity Mark and Ashes of the Singularity, but other examples would not work such as Strange Brigade, UE5 5.1, 5.2, 5.3, and 5.4. Separate more complex allocations for path tracing and ray tracing would not work.

So is it feasible? … yes, but on a limited scale.

Hope that helps

1 Like

Multi-GPU setups without relying on traditional SLI or Crossfire indeed open up interesting avenues, especially for titles or engines that support heterogeneous multi-GPU (mGPU) configurations. Your list highlights some of the better-known examples, like Ashes of the Singularity or Strange Brigade, which use mGPU to split workloads effectively, sometimes even across different GPU brands.

So
adding to the list for mGPU titles that I have working:

Remnant 2
Silent Hill 2
Talos Principle II
Hellblade II/ Senua
Ashes of the Singularity
Strange Brigade
Warmamer Total War I and II
Civilization 6
Home Together
Quake II RTX
Portal RTX
Alan Wake II
Chernobylite
Echo (ultra/ultra)
Ascenteroid
Alien Isolation
Agony
Zombie Army 4
Sniper Elite 4
X4
Succubus/Nymphomaniac (UE5)
Matric City Demo UE5.3 (5.4 has some issues)

benchmarks to help tweak:
GarvityMark 1.88

Please consider I do not sit around all day and try to make games work in mGPU. I do this here and there. Not dedicated to it as a way of life :wink:

1 Like

I wonder if this could be applied to fortnite since it’s a UE game

I have not done fortnite. Mostly because I don’t steam, epic, EA, or Ubisoft anything. My personal rule is that it must run natively from its own executable, not a rental game like steam. So fortnite definitely can’t run on its own merit.

All my games, only run natively, and it does require a little research each time, but worth it.

My games run from their folder, they don’t have to be installed, and do not require third party stuff running in the back ground.

Long story short, I have not done fortnite.

I’ll be updating to dual or quad 5090s and the wife can have the 4090s for her PC.

I’ve been watching reviews of the 5090s and I am concerned. Apples to apples, my 4090s, non DLSS, 4k max settings are higher than any of the reviewers

example DSOG review of 4090 vs 5090, look at Chernobylite bench

and here is my 4090s. higher than the 5090, also in AW2, Cyberpunk, etc (all non DLSS 4k max setting)
even lowest 1% is higher than the 5090, and I don’t overclock

Hope the 5090 is not a disappointment for those of us that don’t do the DLSS gimmick

My single 4090 score in chernobylite, 4x, max, Ray Tracing set to ULTRA

My Dual 4090 score (mGPU not SLI)

Benchmark for 5090 review at DSOG

Hence, my concern, as all my non overclocked 4090 scores are HIGHER than the 5090 non DLSS scores.

Situation and gap got worse when I used the new 571.96 driver as I gained 4-5%
driver found in the new 12.8 CUDA toolkit at

I loved the video review that he made for LV1 Tech on the 5090, and I hope apples to apples the gain is the same.

A few logical concessions:
maybe my system was already better optimized and those optimizations will carry over to the 5090s

maybe there are several driver optimizations that need to be released to take better advantage of the 5090s

maybe testers have anomalies in their benchmarks

maybe things like BUS, bandwidth, processor(s), OS optimizations, etc have a lot of impact

…maybe all of the above

its just that at 2k a card I want to FEEL the performance of the 5090s

I have made a loving comfy home for the dual 5090s

3 Likes

Hello @JayVenturi

I stumbled across this forum while looking into using dual GPUs for dual monitor setpus for sim racing. Possibly even VR at some point.

I searched high and low for a 5090 but could not get my hands on one. I was able to get multiple 5080s however. So if i cant get a single 5090, then i am hoping dual 5080s would do the job.

My question to you is if only games made in the UE environment work for dual GPUs. One game in particular i am interested in trying out if i were to open two 5080s would be Asseto Corza Evo. It is made on their proprietary game engine. I wouldnt even know where to look to find out if MGPU is supported.

Anyway, any suggestions would be greatly appreciated. I am still holding out hope for a 5090 and have been hesitant to open the 5080s in case i came across someone willing to trade for 2, or even 3 5080s lol

1 Like

Sir
I am not sure about the game you refer to, I do not have it to test. As for mGPU, I have gotten it to work on UE titles but also non UE titles :

Zombie Army 4
Sniper Elite 4
Civilization VI
Alien Isolation
Ashes of the Singularity
X4
Strange Brigade
Quake II RTX
Alan Wake II
Warmamer Total War I and II
and
Gravity Mark

I suggest that if you can make dual GPU work on Gravity Mark then developer implementation is the rate limiting step to other titles.

As I have not been able to procure a pair of 5090s, I really would only be speculating on the 5080s

I appreciate your response.

I do not game much or often. What excites me about the idea of possibly doing this experiment is the sheer nostalgia of going dual GPUs. For the culture.

The question begs to be asked: would dual $1000 5080s beat a single $2000 5090?

For ease of use and compatibility, single 5090 would be the answer. But seeing how difficult those are to procure, if a pair of 5080s will do the job or better, I personally wouldnt mind getting my hands dirty.

Do you have any thoughts on viability of VR, and have you had any experience with multi screen gaming?

Hi and thanks for bringing mGPU back from the ashes (pun intended), I haven’t heard of it in ages and I am glad that this is apparently not abandoned and I always wondered if heterogeneous mGPU could be viable. So I added my old 3070 alongside a 4070ti and followed your advice to start with gravity benchmark and here are my results:

–Setup–
5900x 32GB 3600c16 Asus B550-e
PNY verto 4070ti + 3070FE
All watercooled (custom)

–Settings–
Render: Default
AA: On (Temporal)
Mode: Windowed
Resolution: 2k (2560x1440)
mGPU: Alternate (AFR)
Asteroids: 200.000
LOD Bias: Default

  • Vulkan score
    3070 (mGPU off): 31,062 / 186 FPS
    4070ti(mGPU off): 50,556 / 302 FPS
    3070+4070ti AFR : 68,414 / 409 FPS

  • DX12 score
    3070 (mGPU off): 32,219 / 192 FPS
    4070ti(mGPU off): 48,008 / 287 FPS
    3070+4070ti AFR : 62,012 / 371 FPS

Notes:

  • Something weird the 3070 is reported as GPU0 in windows and If I select the 3070 as device 0 in the benchmark it uses the 4070ti (which is reported as GPU0 using command: “nvidia-smi”). 4070ti is on PCIex16_1 slot and 3070 on PCIex16_2 slot. Both run at PCIe gen4 x8 mode as per motherboard default.
    No idea how it affects the benchmark but for AFR benchmarks I used 3070 as device 0 and 4070ti as device 1, If I do the opposite I get worse result (vulkan: 52,658/ 315fps ; dx12: 53,370/ 319fps)

  • During AFR benchmark 3070 is at 100% usage while 4070ti is around 75-80%

Some guesses on heterogeneous mGPU AFR (to be tested for confirmation):

  • AFR performance seems to be limited by the weakest GPU, although vulkan seems to give little more than double the 3070 performance dx12 is slightly below double the 3070 performance, so my guess is that if I were to used a 2nd GPU that has less than half the performance of the 4070ti it might be worse than using just the 4070ti (which means forget about iGPU or your 10yo GPU boost dream…) I may test it later if I find a weak GPU to confirm that though
  • Same thought for VRAM, since both GPU need to render frames in alternance they use the same amount of VRAM so it might end up to be limited by the GPU that has less vram
  • Obviously it would work best with 2x the same GPU but It still seems to give great result so it is worth digging imo, I suggest to use GPU that have closest amount of VRAM and performance

I am going to try some games now that I know it is theorically possible however this seems tricky, like Civ VI I cannot enable mGPU (greyed out) maybe because it doesn’t recognise my setup due to different GPU or perhaps it is due to my GPU0 bug…

@JayVenturi
For Unreal games I saw in your initial post that you used various launch arguments/ settings but I don’t get where each of those belong if you could detail the steps you did to make it work (even if it is only for 1 specific game) I would really appreciate because I couldn’t figure it out myself.

1 Like

Your findings are amazing.

You are getting 84% of the individual vulkan scores with mGPU. Thats FANTASTIC scaling.

DX12 gives you about 75% which is still incredible.

I dont remember getting anywhere near that using crossfire bsck in the day. Really wish more games supported this natively.

Thanks for your input here. The temptation of opening these 5080s getts stronger.

2 Likes

Honestly I would rather have a single 5090 than a pair of 5080, unless you have unlimited money and plan to buy a 5090 later anyway :grin: if that’s the case I am curious to see the results. Otherwise I would recommend against making purchase decisions based on that. VRAM doesn’t seem to add up in that case, and you won’t get a pool of 32gb of ram since it doesn’t use nvlink. I did it because I already had the 3070 collecting dust and I wondered if there was any practical use case (which I am still wondering because scoring a benchmark is not particularly useful). If you have 2 GPUs lying around that aren’t that far off in terms of VRAM/performance it is worth trying.

Also in my case I am only getting about +30% framerate compared to the 4070ti alone which could mean it only scales well if you have 2 similar GPU or as I suspect close enough in terms of performance could give good results, but won’t get 4070ti+3070 performance addition, more like double the weakest GPU performance in best case scenario.

As for AFR in my understanding (I may be wrong) is that the latency will be as low as the slowest gpu in the build or a single one: If the single 5080 runs at let’s say 80fps then even if dual 5080 provide 140fps you will still have the same latency as if it was running at 80fps. so even if the 5090 could be theoretically slower let’s say 120fps, you would still have less latency than the dual 5080. Are the extra frames worth the latency trade-off?

In my case it is even worse since my 3070 is holding back the 4070ti potential, based on the benchmark I may have +30% fps but if the latency can only be as good as the 3070 it means I get also +40% latency compared to the 4070ti alone so depending on the scenario and the type of multi-adapter method It could deliver worse experience despite higher number (same issue as frame generation)

Of course there are other use cases than just AFR which I am looking to test especially for unreal engine that might justify mGPU over single-GPU but I have still a lot to learn in that field.

1 Like

Hello
I’ll try to map out again example Chernobylite and post, that one was more complicated.

The easier examples are Strange Brigade, Ashes of the Singularity and Gravity Mark as they work without too much fussing and have friendly menus to get you there. GravityMark is a great tool to hone and optimize you’re setup.
… but the OS optimizations (HAGS, GPU per affinity, ReBAR etc) and the driver basics only (no GeForce Experience, none of the driver optional install components that interfere and importantly, → nv container CAN NOT be running in the background)

Example, I do not have any of that in the background, runs faster and more stable. I don’t use the nvidia app or the control panel, I integrated the inspector and profiles instead on my right click:

Nvcontainer / nvidia service breaks many mGPU attempts

I have it turned off in services (manual)
I can’t seem to included an image in this message editor so I will provide that setup in your main thread.
I use a right click shortcut to control the set up but if needed I use the start stop shortcuts

here is my desktop right click:

I disable the NCContainer / nvidia service but if I ever need to get to it I start and stop it with shortcuts:

(yes, all my tiles are orderly and well behaved and are transparent without any vendor or MS added colour, I also have no ads running

for tile transparent background colour modify the xml
with:
BackgroundColor=‘transparent’/>

…easy enough, just try it

nvcontainer set to manual in services as such:

service

start
start stop shortcut commands are easy:
net start NVDisplay.ContainerLocalSystem
net stop NVDisplay.ContainerLocalSystem

you may want to disable Shader Cache in the CPL, for me, it breaks all my mGPU or at least reduces the performance significantly, the CPL Shader Cache has a negative effect on ALL my games

If you are on server 2022 or preview 2025 data center as workstation, you can prevent mitigations in

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
“FeatureSettingsOverride”=dword:00000003
“FeatureSettingsOverrideMask”=dword:00000003

Also disable any mitigations in BIOS (your choice)
I realize some folks don’t understand mitigations and have a rubber stamp response, but I leave that choice up to you, these are my choices.

next
you must have resizable BAR on in the BIOS and you need to set the Resizable BAR to “unlimited” ( I believe the switch choice is “32” in the setting.

in windows you can use (if your hardware is capable) the :

ReBarState.exe

the driver itself should only include these things in the downloaded driver and remove so that this is what is left:

and modify the bottom of the CFG file so that the three lines under the eula are gone, like this:
IMG_6703

ONLY then di the exe setup

for pre 2015 games that use physics you should install the separate concurrent standalone exe from nvidia.

PhysX-9.13.0604-SystemSoftware-Legacy.exe

etc etc etc

yes, lot’s to do to have a fast stable windows system, I could not get it to work under win 10 or win 11, but it works under server 2022 and server 2025 (I use windows server 2022 or 2025 data center as a lean Workstation)

Then starts the ramp to some of the more involved steps

the next “easier” game to set up is Civilization IV and also a good introduction to what lies ahead.

step 1: DO NOT USE STEAM VERSIONS OF GAMES, only standalone versions without a third party check in “rent to own” games. Games that the version lets you launch the game from the real exe in the game folder.

Civilization IV, you’re goal is to enable this check mark without SLI or NVLink:

in in fraxis, in the graphicoptions.txt
set these changes:

;Set DX12 compute queue usage on (1), off (0), or platform default(-1).
EnableAsyncCompute 1

;Enable DX12 split-screen optimizations for multi-GPU systems. On (1), Off(0), or platform default (-1) [Platform Default is OFF, but may change in the future.]
EnableSplitScreenMultiGPU 1

[Video]
;Experimental rendering modes
RenderMode 1

;Experimental resolve modes
ResolveMode 1

and save the file as “read only”

you can also go into the other config files and manually set the amount of threading for cpus and other features, it scales well.

I have to look up which version I have as one of the updates from Fraxis broke all those GPU and CPU perks.

So, ready to start the journey to faster more stable and enjoy your PC again?

Disclaimer:
I only toy with this stuff as a hobby, I’m more of a cars, motorcycles, swiss watches, fountain pen, CDs and Vinyl, pistols kinda guy. And so I may not convey in the appropriate vernacular and lexicon of geek-speak. It probably doesn’t help that English is not one of my primary languages.

Anyhow, it FEELS as if nvidia and MS deliberately make it complicated, I merely look for solutions around the obstacles. if you’re interested I may post more on the MS server as your personal PC OS and how to make it be fast, private, stable and able to do all the things I do.

some stuff is complex and some stuff is simple like use the HOST file to give you browsing and your day to day life more privacy. See, I’m of the opinion the OS works for me and that the OS is not a way of life. Whenever I see Microsoft or Google say that the keep your data private, my first thought is “how do I keep my data private from YOU (MS,Google, Apple, etc)?”

Anyhow, that’s a separate rant

Thank you very much for explaining, this is very helpful as I am not an expert in this domain, just a simple gamer that like to mess around with stuff. I’m gonna need some time to read and understand but I start to see why I was facing issues with windows.
As I am a stubborn guy I will attempt on windows 10 first and later I will realize it will never work so I will try to install windows server on a separate drive, I need to make sure I can run all my software before switching my main OS but it seems like a good idea if it works since w10 is EOL anyway and w11 is even worse…

Quick Update: Still couldn’t manage to make this goddamn civ VI game work, also I had some weird issues with my PC lately in top of that I suspect my power supply is starting to get tired… I tried ashes of singularity which worked well (no surprise here) but It will take some time and boot on windows server hopefully I can make it work at least with a pair of 3070 then I will try with the 4070ti + 3070 again

1 Like

I understand the temptation, and I relish in the idea of the "smell of a new card’

Bare in mind that I am also waiting on a pair of 5090 FE. A pair of 4090 FE, as I have right now, quite puzzlingly, is out performing by 130% the reported benchmarks of a single 5090.

The pricing of 5080 is street value higher than 4090.

I would also bare in mind that there have been so many issues with the 5090, and the worst being architectural issues and missing ROPs, I think anything else I could handle, but chip flaws…

Knowing myself very well - If I had a pair of 5080 lying there I’d probably have cracked them open on the walk back from the mailbox. If paid higher than retail… then its more subjective.

With that said I wanted to leave you with enthusiasm and some caution:

mGPU has been a struggle (see ALL the posts above) and some witchcraft, burnt offerings, and some failures.

If you are going to do mGPU the only way to get it 100% of the time is to use your own application or algorithm in CNN, NLP, RCNN, etc, and not be at the mercy of what code or features internally made it through “an enthusiastic developer team who’s final product and imbedded code survived the publisher and lawyers”

(borrowing from Bloom County)