The Zen of mGPU

Noofle · August 10, 2025, 10:27pm

I got the GOG version and tried fiddling around with it but I’m not really sure which config files I need to put those lines of parameters into. I’ll just wait till you can get back to your PC and share your setup for it in more detail.

For now I’m going to focus on Unreal Engine and making some sort of dual GPU demo through the multi-process rendering nDisplay offers, hopefully I can cook up something good, nDisplay is rather flexible so hopefully I’ll be able to boost the FPS more than the 40 extra I got last night with my SFR setup.

Noofle · August 24, 2025, 11:16pm

Dual GPU unreal project is on a bit of a pause until I can get this setup / config file from you. I tried fiddling with it but wasn’t able to get anything to work. If I get the config information I should be able to make a unreal project on the version that supports it for a multi GPU game.

Alternatively I’ve been thinking about digging into world of graphics programming, maybe mess around with DX12’s built in mGPU support and try to make a real time path / ray tracer.

JayVenturi · September 8, 2025, 5:11pm

So… A problem has been encountered (and solved) in the 72.xxx to 81.xxx drivers from nvidia in regard to any 5.xxx series GPU, please follow the bouncing ball.:

When trying to run Direct X (instead of Vulkan) I get this error every time, but vulkan runs fine.

The 2.xxx, 3.xxx, and 4.xxx series GPUs work on multi GPU mode for VK and DX12, however
the 5.xxx series GPUs only work in VK and not DX12 due to a driver issue with nvidia.
This for DX12 mGPU and is not referencing SLI

AMD GPUs do not have an issue in mGPU for DX12 or VK

The issue, repeat, is unique to nvidia 5.xxx series cards andDX12 due to a driver issue.

The issue has been reported many times but nvidia has been silent on responses, zero feedback

This was also reported on other forums - again nvidia has been silent.

The error manifests in applications aor games that can take advantage of DX12 mGPU and VK

Examples would be Ashes of the Singularity and GravityMark

Ashes of the Singularity crashes when selecting DX12 mGPU in the Ashes menu
Gravity mark crashes when selecting DX12 instead of VK
Strange Brigade works on mGPU and Vulkan but crashes on DX12 and mGPU

Error (2)

D3D12_HEAP_FLAG_SHARED | D3D12_HEAP_FLAG_SHARED_CROSS_ADAPTER

D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR

Ashes of the Singularity crashes the same way when selecting DX12 and mGPU:
ASDX12mGPU (1)

The issue was also reported on the developer forum:

truncated:
//*********************************************************
//
// Copyright (c) Microsoft. All rights reserved.
// This code is licensed under the MIT License (MIT).
// THIS CODE IS PROVIDED AS IS WITHOUT WARRANTY OF
// ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING ANY
// IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR
// PURPOSE, MERCHANTABILITY, OR NON-INFRINGEMENT.
//
//*********************************************************

#include “stdafx.h”
#include “D3D12HeterogeneousMultiadapter.h”

const float D3D12HeterogeneousMultiadapter::TriangleHalfWidth = 0.025f;
const float D3D12HeterogeneousMultiadapter::TriangleDepth = 1.0f;
const float D3D12HeterogeneousMultiadapter::ClearColor[4] = { 0.0f, 0.2f, 0.3f, 1.0f };

D3D12HeterogeneousMultiadapter::D3D12HeterogeneousMultiadapter(int width, int height, LPCWSTR name) :
DXSample(width, height, name),
m_frameIndex(0),
m_triangleCount(MaxTriangleCount / 2),
m_psLoopCount(0),
m_blurPSLoopCount(0),
m_currentTimesIndex(0),
m_drawTimeMovingAverage(0),
m_blurTimeMovingAverage(0),
m_viewport(0.0f, 0.0f, static_cast(width), static_cast(height)),
m_scissorRect(0, 0, static_cast(width), static_cast(height)),
m_currentPresentFenceValue(1),
m_currentRenderFenceValue(1),
m_currentCrossAdapterFenceValue(1),
m_workloadConstantBufferData(),
m_blurWorkloadConstantBufferData(),
m_crossAdapterTextureSupport(false),
m_rtvDescriptorSizes{},
m_srvDescriptorSizes{},
m_drawTimes{},
m_blurTimes{},
m_frameFenceValues{}
{
m_constantBufferData.resize(MaxTriangleCount);
ThrowIfFailed(DXGIDeclareAdapterRemovalSupport());
}

void D3D12HeterogeneousMultiadapter::OnInit()
{
LoadPipeline();
LoadAssets();
UpdateWindowTitle();
}

// Enumerate adapters to use for heterogeneous multiadaper.
Use_decl_annotations
HRESULT D3D12HeterogeneousMultiadapter::GetHardwareAdapters(IDXGIFactory2* pFactory, IDXGIAdapter1** ppPrimaryAdapter, IDXGIAdapter1** ppSecondaryAdapter)
{
if (pFactory == nullptr)
{
return E_POINTER;
}

// Adapter 0 is the adapter that Presents frames to the display. It is assigned as
// the "secondary" adapter because it is the adapter that performs the second set
// of operations (the blur effect) in this sample.
// Adapter 1 is an additional GPU that the app can take advantage of, but it does
// not own the presentation step. It is assigned as the "primary" adapter because
// it is the adapter that performs the first set of operations (rendering triangles)
// in this sample.

ThrowIfFailed(pFactory->EnumAdapters1(0, ppSecondaryAdapter));
DXGI_ADAPTER_DESC1 descSecondary;
ThrowIfFailed((*ppSecondaryAdapter)->GetDesc1(&descSecondary));

*ppPrimaryAdapter = nullptr;
ComPtr<IDXGIAdapter1> adapter;

ComPtr<IDXGIFactory6> factory6;
if (SUCCEEDED(pFactory->QueryInterface(IID_PPV_ARGS(&factory6))))
{
    for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != factory6->EnumAdapterByGpuPreference(adapterIndex, DXGI_GPU_PREFERENCE_HIGH_PERFORMANCE, IID_PPV_ARGS(&adapter)); ++adapterIndex)
    {
        DXGI_ADAPTER_DESC1 descPrimary;
        ThrowIfFailed(adapter->GetDesc1(&descPrimary));
        if (descPrimary.AdapterLuid.HighPart != descSecondary.AdapterLuid.HighPart || descPrimary.AdapterLuid.LowPart != descSecondary.AdapterLuid.LowPart)
        {
            break;
        }
    }
    *ppPrimaryAdapter = adapter.Detach();
}
else
{
    ThrowIfFailed(pFactory->EnumAdapters1(1, ppPrimaryAdapter));
}
return S_OK;

}

Summary:

Issue is localized to 5.xxx series nVidia GPUs on DX12 mGPU (non SLI)

This is 100% reproducible on 5.xxxx GPUs and mGPU/DX12

Additionally, the issue is independent an non-relevant to any BIOS setting or motherboard

tested on Asus WS PRO, Asus Sage Gigabyte MZ72 series and MZ73 series.

OS was fully patched Windows 11, and Windows Data Center 2022 and 2025

All GPUs tested were the Founders Edition, except for the AMD GPUs of course

This applies to all drivers so far from 572.16 to 581.15

JayVenturi · September 8, 2025, 5:13pm

I solved the DX12 and multiple 5.xxx series GPUs errors.

example. DX12 and dual 5090s is SOLVED

this also solved it for:

Gravitymark in DX12 mGPU AFR
Ashes of the Singularity DX12 mGPU
Strange Brigade in DX12 mGPU
And several UE5 games mGPU

UTILIZATION in GravityMark as example:
GPU 1. - 96-100%
GPU 2. - 83-94%

I still have some refinements to do, but I’m most of the way there

lastly, overall performance increased for some single GPU benchmarks/games/apps

Details to follow

before the fix, Strange Brigade and Gravity Mark only ran dual GPUs with Vulkan, and Ashes of the Singularity, Escalation would crash on launch

This also affected several UE5 titles when running on 5.xxx GPUs

No fix from nvidia yet, I’m not holding my breath, but there is a way

J

mGPU for 5.xxx series GPUs is now working

JayVenturi · September 9, 2025, 11:49pm

The Omniverse Kit has worked fine with the dual 5.xxx series cards in DX12 applications

I’ll repeat that:

ONLY the Omniverse Kit has worked fine with the dual 5.xxx series cards in DX12 applications.

So I went on a deep dive of the profile parameters in profile inspector for the Omniverse Kit.

There are several unique modifications compared to other profiles and so I tested each to eliminating the key that makes or breaks Dual 5.xxx series cards from working in mGPU mode in DX12.

The only differences, but effective are:

FXAADis

and

reBar

That was all it took.

REGARDLESS if those feature flags are used or not, they have to be disabled (both) in the profile of the app/game for DX12, or mGPU will not work

I expected something more elaborate as I was diligently eliminating Omniverse specific keys, but alas, turns out only these two variables need to be set.

What makes NO SENSE is a side effect of some benchmarks having higher scores in Single GPU from those variables being set.

Obviously there is a significant issue in the nvidia drivers from 72.xx to 81.xx and this is just a work around, hopefully nvidia will fix it soon …

~unless it was deliberate

.
.
.

…
in making the workaround for the issues (above) in the nvidia driver when using dual 5.xxx series GPUs. there was also a side effect of performance improvement in some dual and single GPU cases.

Chernobylite has an extensive and well thought out internal benchmark. It displays the settings used in the benchmark and they are displayed in the in the foreground while the benchmark continues in the background/

At max settings Max Ray-tracing, Max visuals, 4K, no DLSS, no Frame Gen:

.

Single GPU Benchmark:
solution enabled

.

Dual GPU Benchmark:
solution enabled

.
Please note: that the min frame rate in dual GPU occurs on just the first 1-2 seconds of benchmark launch, afterwards its in the 200+

Darkman666 · September 11, 2025, 1:38am

That looks the classic issue of Rebar = bad for mGPU. On AMD’s side you can’t enable mGPU without disabling rebar/SAM in bios on the Rx 6,000 series. It’s even listed in AMD’s notes about mGPU.

All the drivers are now tuned for single card high cache cpus like x5800x 3D, 7800x 3D, & 9800x 3D.

Even my single card scores in are higher in benchmarks with a single card on my two RTX 2080 ti’s in S.L.I setup with a 5800x 3D. The games however are NOT the same. A higher clock cpu, as I’ve had a 5600x in here that clocked to a 4.85ghz vs this 3D caches measly 4.50ghz-4.55ghz. The 5600x scores slower on benchmarks, but gets higher fps during regular game play. ¯_(ツ)_/¯ The
1% lows & 0.1% lows are only issue if they become an average low fps by comparison.

Noofle · September 28, 2025, 4:36pm

Thanks for posting the fix! Whenever you get around to it post your run sometime, I’m curious with DX12 and your more optimized / better hardware system if you might be able to push 1000+ FPS on GravityMark. I was just barely able to beat your best Vulkan run by using DX12 and overclocking the hell out of both of my 5090 FE’s (+700MHz Memory Clock, +370MHz GPU Core Clock).

I haven’t done much with my second GPU since getting it except using some local AI (gpt-oss-120b mostly) and enjoying much faster blender renders. In other news I have contacted the developer of GameTechBench and asked if they could provide one of the older builds that still has mGPU support, I am waiting to hear back, hopefully they respond soon.

Also if you can get around to it if you could post your Engine.ini file or similar configuration file for Chernobylite that would be sick, I really want to try running a actual unreal engine game with mGPU but even with all the fiddling I’ve tried I cannot figure out how to do it.

JayVenturi · September 29, 2025, 3:01am

Sir,
that is a great score you have there!! That’s great for DX121 You really over locked those Fees to the max .

the DX12 benchmark will always be slower than the Vulkan, unless Tellusim provides some optimizations. The best I could do was just to solve the nvidia block on DX 12 MGPU, but Vulkan seems overall better in mGPU.
As DX12 scores go, what you did was amazing!!

I’ve tried , but my DX12 runs seem to not break 161xxx

I’ll have to see if I can find those Chernobyl ini files, and post them for you.

J

JayVenturi · September 29, 2025, 8:32pm

My ongoing issue is that due to the dual blower passthrough design of the nvidia cards, the hot air from the 1st GPU bathes the 2nd GPU:

I recently updated to dual Gigabyte 5090 OC Editions replacing my Nvidia 5090s

The gigabyte 5090s have a more generous cooler and stayed below 39 C at idle with the fans off (normal behavior when not using an app or gaming.

The challenge was the sheer size of the Gigabyte 5090s OC…huge challenge as of the HUGE cards compared to the nvidia FEs.

So this required modding the PC, you can see the modding on this thread:

The final product runs silent (in most cases, unless a heavy gaming session, but even then its not loud)
Under load the cards peak at 68-70C, which was lower than the Founders Edition, but more importantly, when the video card fans are not going, the cards stayed around 39C in a room that is normally about 71F.

JayVenturi · September 29, 2025, 8:49pm

While this thread is about mGPU, there is a tertiary spin that should be considered:

Running Frame Generation on the 2nd GPU and have the 1st GPU do the primary rendering.

Yes, I know, that not mGPU, HOWEVER, we are technically using the 2nd GPU to render within the same application.

To run Frame Generation on the second GPU, use LOSSLESS SCALING

I invite you to search for it on the internet so that this post remains of readable size.

The process is simple, in the menu of LOSSLESS SCALING, select your second GPU as the processing for Frame Generation.

There are multiple FG algorithms you can ick from and each has their benefit.

I could provide a lot of tweaking suggestions but there are so many mega -threads on lossless scaling, I figured you can sample everyone else’s advice and come to your own preferences.

I would suggest limiting the FG to just 1, and not be temped by 2x, 3x, and 4x, but I’m more of a visual stickler than most.

I don’t Steam, so I got my stand alone copy of Lossless Scaling from Humble Bundle for $6.99

Go try it and enjoy

J

JayVenturi · October 5, 2025, 12:42am

been messing with lossless scaling and UE4 and UE5 titles

for unreal 4 and 5 engine games, use this for the default engine as a catch all, also remember to modify your shortcut to " -DX12 -MaxGPUCount=2" or " -DX12 -MaxGPUCount=4"

you can also use the lossless scaling

parameters that are not applicable, are ignored, it should provide a boost, just disable streaming. Make sure you haver at least 16 cores

DefaultEngine.ini

[/Script/Engine.RendererSettings]
r.PathTracing.MultiGPU=2
r.PathTracing.GPUCount=2
r.AllowMultiGPU=1

; Disable texture streaming for quality.
; Eenable texture streaming by setting to “1”
r.TextureStreaming=0
r.MeshStreaming=0

[/Script/Engine.RendererSettings]
r.AllowFrameGeneration=1

; GPU Crash Management
r.D3D12.GPUCrashDebuggingMode=0
r.D3D12.GPUTimeout=0
r.GPUCrash.Collectionenable=0
r.GPUCrash.DataDepth=0
r.GPUCrashDebugging.Aftermath.Callstack=0
r.GPUCrashDebugging.Aftermath.Markers=0
r.GPUCrashDebugging=0
r.GPUCrashDump=0

r.ParallelAnimationCacheConversion=1
r.ParallelAnimationCacheConversionAsync=1
r.ParallelAnimationCacheStreaming=1
r.ParallelAnimationCompression=1
r.ParallelAnimationCompressionAsync=1
r.ParallelAnimationEvaluation=1
r.ParallelAnimationRetargeting=1
r.ParallelAnimationRetargetingAsync=1
r.ParallelAnimationStreaming=1
r.ParallelAnimationStreamingAsync=1
r.ParallelAnimationUpdate=1
r.ParallelAsyncComputeSkinCache=1
r.ParallelAsyncComputeTranslucency=1
r.ParallelBasePass=1
r.ParallelBatchDispatch=1
r.ParallelCulling=1
r.ParallelDestruction=1
r.ParallelDistanceField=1
r.ParallelDistributedScene=1
r.ParallelGraphics=1
r.ParallelInitViews=1
r.ParallelLandscapeLayerUpdate=1
r.ParallelLandscapeSplatAtlas=1
r.ParallelLandscapeSplineSegmentCalc=1
r.ParallelLandscapeSplineUpdate=1
r.ParallelLightingBuild=1
r.ParallelLightingComposition=1
r.ParallelLightingInject=1
r.ParallelLightingPropagation=1
r.ParallelLightingSetup=1
r.ParallelMeshBuildUseJobCulling=1
r.ParallelMeshBuildUseJobMerging=1
r.ParallelMeshDrawCommands=1
r.ParallelMeshMerge=1
r.ParallelMeshProcessing=1
r.ParallelNavBoundsCalc=1
r.ParallelNavBoundsInit=1
r.ParallelNavBoundsUpdate=1
r.ParallelNavOctreeUpdate=1
r.ParallelParticleUpdate=1
r.ParallelPhysicsScene=1
r.ParallelPhysicsStepAsync=1
r.ParallelPostProcessing=1
r.ParallelPrePass=1
r.ParallelReflectionCaptures=1
r.ParallelReflectionEnvironment=1
r.ParallelRendering=1
r.ParallelRenderUploads=1
r.ParallelSceneCapture=1
r.ParallelSceneColorGather=1
r.ParallelShaderCompile=1
r.ParallelSkeletalClothBoundsCalc=1
r.ParallelSkeletalClothGather=1
r.ParallelSkeletalClothPrepareSim=1
r.ParallelSkeletalClothSimulate=1
r.ParallelSkeletalClothSkinning=1
r.ParallelSkeletalClothUpdate=1
r.ParallelSkeletalClothUpdateBounds=1
r.ParallelSkeletalClothUpdateVerts=1
r.ParallelTaskShaderCompilation=1
r.ParallelTonemapping=1
r.ParallelTranslucency=1
r.ParallelVelocity=1
r.ParallelZPrepass=1

r.AllowMultiThreadedShaderCreation=1
r.ThreadedShaderCompilation=1
r.Streaming.LimitPoolSizeToVRAM=0

[/Script/Engine.GarbageCollectionSettings]
gc.MultithreadedDestructionEnabled=1

[RenderingThread]
bAllowAsyncRenderThreadUpdates=1
bAllowThreadedRendering=1

[/Script/AKAudio.AkSettings]
bEnableMultiCoreRendering=1

[CrashReportClient]
bAgreeToCrashUpload=0
bImplicitSend=0

[Engine.ErrorHandling]
bPromptForRemoteDebugging=0
bPromptForRemoteDebugOnEnsure=0

[/Script/WInstrumentedProfilersSettings.WTelemetrySettings]
bEnableTelemetry=0

[FATHydraCrashHandler]
LogCrashReportHydra=0
LogCrashUploader=0

[Core.Log]
Global=all off
LogAI=all off
LogAnalytics=all off
LogAnimation=all off
LogAudio=all off
LogAudioCaptureCore=all off
LogAudioMixer=all off
LogBlueprint=all off
LogChaosDD=all off
LogConfig=all off
LogCore=all off
LogDerivedDataCache=all off
LogDeviceProfileManager=all off
LogEOSSDK=all off
LogFab=all off
LogFileCache=all off
LogInit=all off
LogInput=all off
LogInteractiveProcess=all off
LogLevelSequenceEditor=all off
LogLinker=all off
LogMemory=all off
LogMemoryProfiler=all off
LogMeshMerging=all off
LogMeshReduction=all off
LogMetaSound=all off
LogNFORDenoise=all off
LogNetwork=all off
LogNetworkingProfiler=all off
LogNiagara=all off
LogNiagaraDebuggerClient=all off
LogNNEDenoiser=all off
LogNNERuntimeORT=all off
LogOnline=all off
LogOnlineEntitlement=all off
LogOnlineEvents=all off
LogOnlineFriend=all off
LogOnlineGame=all off
LogOnlineIdentity=all off
LogOnlinePresence=all off
LogOnlineSession=all off
LogOnlineTitleFile=all off
LogOnlineUser=all off
LogPakFile=all off
LogPhysics=all off
LogPluginManager=all off
LogPython=all off
LogRenderTargetPool=all off
LogRenderer=all off
LogRendererCore=all off
LogShaderCompiler=all off
LogShaderCompilers=all off
LogSlate=all off
LogSourceControl=all off
LogStreaming=all off
LogStudioTelemetry=all off
LogTargetPlatformManager=all off
LogTelemetry=all off
LogTemp=all off
LogTextureEncodingSettings=all off
LogTextureFormatManager=all off
LogTextureFormatOodle=all off
LogTimingProfiler=all off
LogUObject=all off
LogUObjectArray=all off
LogUsd=all off
LogVRS=all off
LogVirtualization=all off
LogWindows=all off
LogWindowsTextInputMethodSystem=all off
LogWorldPartition=all off
LogXGEController=all off
LogZenServiceInstance=all off
PixWinPlugin=all off
RenderDocPlugin=all off
LogLinker=All Off
Global=Off
bEnableCrashReport=False

r.ParallelAnimationUpdate=1
r.ParallelShaderCompile=1
r.ParallelRendering=1
r.ParallelGraphics=1
r.ParallelBasePass=1
r.ParallelPostProcessing=1
r.ParallelTranslucency=1
r.ParallelVelocity=1
r.ParallelZPrepass=1

bAllowMultiThreadedShaderCompile=1
bAllowAsyncRenderThreadUpdates=1
bAllowThreadedRendering=1
bEnableMultiCoreRendering=1
UseAllCores=1
r.Streaming.LimitPoolSizeToVRAM=0

LogShaderCompilers=all off
LogShaderCompiler=all off

[Core.System]
+Suppress=Scriptwarning
+Suppress=Scriptlog
+Suppress=Warning
+Suppress=Error

[/Script/Engine.Engine]
bOptimizeAnimBlueprintMemberVariableAccess=1
bAllowMultiThreadedAnimationUpdate=1
bAllowMultiThreadedShaderCompile=1
bAllowMultiThreadedStaticLighting=True
bEnableMultiCoreRendering=1
bDisableAILogging=1
UseAllCores=1

[ProcessPriority]
CPUPriorityModifier=1
IOPriorityModifier=1

; Remove motion blur
r.MotionBlur.Max=0
r.MotionBlurQuality=0
r.DefaultFeature.MotionBlur=0
r.MotionBlur.Amount=0
r.FastBlurThreshold=0
r.BlurGBuffer=0

; Remove Chromatic Aberration
r.SceneColorFringeQuality=0
r.SceneColorFringe.Max=0
r.NT.Lens.ChromaticAberration.Intensity=0

; Remove DOF
r.DepthOfFieldQuality=0
r.DefaultFeature.DepthOfField=False
r.DepthOfField.MaxSize=0

[/Script/Engine.Engine]
bAllowMultiThreadedShaderCompile=1

JayVenturi · October 7, 2025, 7:00pm

Just for giggles here’s the new record:

…cards never got warmer than 57’C

max frame rate went up as high as 1,353.1

mattlach · October 8, 2025, 10:19pm

I guess I don’t understand what mGPU really does.

My very high level understanding is that DX12 it provides some tools to essentially allow developers themselves to handle how they deal with multiple GPU’s instead of the traditional SLI/Crossfire implementations.

Some do, many do not.

But do we know exactly what they do?

My take would be that if they are intelligently splitting the load it could be pretty good.

But if instead they are just alternating frames (AFR) like where almost all SLI and Crossfire implementations eventually landed, it is mostly useless.

I would choose lower frame rates on a single GPU 100 times out of 100, if all the multiple GPU’s were used for was AFR. AFR gives you better “averages” than with a single GPU, but those averages are not reflective of actual experience. Firstly it increases input lag something awful.

Toms Hardware explained this best in an illustration from back when he compared the Ati Rage Fury Maxx (Dual GPU) to the GeForce 256 SDR way back in 1999:

AFR toms hardware Ati Rage Fury Maxx

AFR also tends to have great framerates on simple scenes, but grind to a halt on more complex ones, meaning, when you could use it the most, it performs not much better than single GPU.

Now if there are mGPU implementations that intelligently split the load of a given frame across multiple GPU’s, that is way more interesting.

JayVenturi · October 9, 2025, 4:19pm

Hi,
I spent a while writing the answer to your question, 3 hours later (Sponge Bob announcer voice) I realized I was writing a summary dissertation on the subject.

The short answer is that it may be AFR, but more than likely an SFR implementation or direct controlled load balancing by the application itself as this is on lower level API.

Here are some links , to start with that would save me from another dissertation (this forum doesn’t provide post graduate credits… or would they?)

I’ll include the intel example where SFR is one technique, but its really up to the developer, Ashes of the Singularity had their own approach and load balancing.

Also, while easily implemented, I don’t consider lossless scaling to be a true mGPU part as they are not real frames/new content.

https://pcper.com/2015/07/the-directx-12-multiadapter-concept-isnt-really-new/

Forgive my laziness , this whole working for a living thing keeps getting in the way.

JayVenturi · October 12, 2025, 2:34pm

Yes, 16x is still broken in Blackwell, supposed to get fixed in 585, 587 or 591,… supposed to be fixed that is.

It would be nice to have the performance of 572, features of 590, fixes of 581, and 16x AF fixed…

…I ask so little, and boy do I get it!

~in my ecosystem, the difference between 572.16 and 581.47 is about 5%-ish drop in performance, come on nvidia! Let’s get the performance back!

~now I do use my rig for ML/CNN algorithms and there where other reasons to go to 5090, but IF this was about gaming alone:

I don’t overclock, but I could overclock my 4090s to get the gap, and saved money on skipping the 5090s when the difference is 5% in performance! (no DLSS)

Before I make an inaccurate blanket statement, the 5090s overclock amazingly well without under-volting or power changes so I can’t fault that, and temps stay reasonable the whole time on a long bench loop at max/high utilization . Check the data here:

Frequencies stayed between 3,189 and 3.224 the whole time.
Temps never got warmer than 57’C for the whole run
Utilization stayed between 87% and 93% till completion

Frequencies, Temps, and Utilization are reported independently for both cards.
(changes in FPS are due to 9 different scene/load changes within the bench)

…you might want to try Gravitymark yourself as it is a good way to bench and hone you GPUs. Can use any/all the APIs (DX12/Vulkan etc)
but it also is a great stability and benchmark to root out driver issues etc as benches can be customized and allow for scene increases including objects and processing
Go see how you do!

Lastly, 2 things:
I trim the drivers as explained above
no app
I have the nvidia container service set to manual, so its not running in the background (slight higher benchmarks)

~rather → I use inspector/profiles, faster, more convenient, no issues, nothing running in background

.

Example, here is my Right-Click:

RightClick

.
.

One more thing, my gaming benchmarks use MAX visuals , this includes the Global Setting to HIGH QUALITY for texture filtering, and NO Shader Cache, (and No DLSS anywhere) please take that into consideration while doing benchmarks:

and

.
.

Please remember to use correct ReBar size limits in the relative app/game:

ReBar Size Limits

16mb = 0x0000000001000000
200mb = 0x000000000C800000
300mb = 0x0000000012C00000
512mb = 0x0000000020000000
1gb = 0x0000000040000000
1.5gb = 0x0000000060000000
2gb = 0x0000000080000000
4gb = 0x0000000100000000
6gb = 0x0000000180000000
8gb = 0x0000000200000000
10gb = 0x0000000280000000
12gb = 0x0000000300000000
16gb = 0x0000000400000000
18gb = 0x0000000480000000
20gb = 0x0000000500000000
24gb = 0x0000000600000000
32gb = 0x0000000800000000

Respectfully

The_DM_Barlow · October 12, 2025, 6:20pm

I would ask you if you’ve personally evaluated any upscale or framegen tech.

While I understand that these things aren’t necessarily for you, at least on a philosophical level, I would love to hear your thoughts on Lossless Scaling results, in particular with your dual-5090s.

I’d also love to hear your thoughts on heterogenuous load, both in your mGPU straight render and in something like Lossless where the load is split not within the frame but within the pipeline from engine to screen.

JayVenturi · October 12, 2025, 7:10pm

Sir,
yes, I have personnally evaluated, tried and tested upscaling and framegen tech.

IMHO:

I find lossless scaling to do, in fact, the thing(s) most folks seek: provided higher FPS and reduce some stutters.
But that is more of a visual lie and other compromises do take place.

However, LS does not render any real “new” frames and does suffer from various visual anomalies , artifacts as well.
With the hardware/software, for gaming, I would rather, see more AFR or SNR than lossless scaling.

If lossless scaling is here to stay, it would be better for AMD and Nvidia to include it with the driver control panel (not an app) and make it selectable and more polished, for all GPUs.

I did/manage to implement, successfully, lossless scaling for dual 5090s and be able to inject LF into most games, successfully. However, I am also keen to see the many visual aberrations from the process.

Example, if I was getting 200-300 FPS in max visuals from a game with mGPU, with LF implemented, I can get 400-500+ FPS.

BUT visually it is a deprecation, lastly if I am already getting 200-300 FPS (my monitor only goes to 144hz) than why bother trying to get 500FPS?

Unbiasedly, ~Lossless scaling’s niche if for folks that need to get up to or above 60FPS or specifically, the gap from where the game is limiting in FPS so that with with LF help → get up to the refresh rate of the monitor. Most folks that use LF are also probably folks that would also use DLSS.

So Lossless Scaling’s forte is going to be raising FPS to someone’s monitor refresh rate when the game optimizations and DLSS can’t close the gap independently.

.

with that said, the elegance is the brute force of rendering every frame, with no DLSS and no Frame Gen, at 4K+, without using any driver gimmicks (resolution reduction) techniques. That improvement over every card generation is/would be what would validate a purchase to an upgrade.

HOWEVER, manufacturers such as nVidia, may not be able or willing in a price point category to provide consumer level improvements from generation to generation to a 20-40% - leaving driver gimmicks to close the gap, such as

~DLSS (screen resolution reduction)
~FG (fake frames)
~Lossless scaling, (more fake frames)
~developers reducing detail, textures, lighting, effects etc for optimization after a game is released, due to lackluster performance and inefficiencies, rather than actually fix core issues.

these are all placebo replacing RAW performance improvements

Further:
in example. nvidia software/driver limits FG and DLSS modes to newer products and deprecates features for prior products, this makes Lossless Scaling ideal for folks who want the FG but don’t have the most recent hardware.

Where Lossless scaling can benefit is if the pipeline (using your term) is part of the load balancing to mimic true SFR or AFR… well… I’m in favor of that, more so towards SFR

Again, this is all IMHO

respectfully

.

… Lastly:

if this was just about gaming, then I am definitely the POT calling the Kettle black because I AM guilty of having to use a second card to make up for a lack of performance from a single card. So it is all a matter of perspective, and one justifying their own point of view. Maybe my logic is only logical to me and is biased, …or maybe its not.

…but I’m sure a Witch Hunter would agree

igormp · October 13, 2025, 2:39pm

Idk if acc precision is that important for your models, but given that Nvidia artificially caps the tensor rates for fp16 with fp32 acc, you could try using fp16 acc for a ~2x boost in compute.
That’s really easy to do if you’re using pytorch, see the following as a reference:

A simple torch.backends.cuda.matmul.allow_fp16_accumulation = True allowed me to have double the throughput in FP16 GEMMs with my 2x3090s.

(I know that you already saw this in the other forum, so feel free to ignore it, just posting it here for future reference to others)

The_DM_Barlow · October 13, 2025, 9:56pm

I’m definitely coming from a perspective of “I want to play games and do cool stuff” so your more zen-science based approach is a lovely thing to see. That’s the kind of diversity of thought that - even if a Witch Hunter - we certainly need here.

Thank you for all your efforts, and I am looking forward to what’s coming next from you.

Darkman666 · October 16, 2025, 5:28am

I just want to mention some of Nvidia’s profiles do split load but they favor the single card for lower power usage.

Example here is an old DX11 game from 2016/(2022 for definition edition) Outward when running just normal S.L.I profile on two RTX 2080 ti’s the driver will load one card main card all the way up to 99% the other card then gets the slip over. Which could anywhere from 0% up to about the highest I’ve ever seen was 35% on this game. However this doesn’t mean I’m getting better performance or better scaling. The game went from about 60fps to 75fps sometimes 90fps
When I go in the control panel & set it on to A.F.R 2 the game runs at a massive 120-140fps in the same area, but the GPU load is now around 65%-85% on both cards. Most reviewers would say that’s bad, because both GPU’s aren’t a 99% now.

In my opinion & from experience older driver done Crossfire S.L.I & mGPU are game engine dependant.

I think Quake II RTX is the on mGPU I got work with these cards that’s Vulkan & it’s much better than DX12. Seem like DX12’s got some bottleneck issues compared to games that have the ability to use other DX12 or Vulkan. Vulkan just runs better smooth & fast most of the time