nVidia rtx pro 6000 info dump

Thread for random things I find interesting while I break in the new cards

_CudaDeviceProperties(name='NVIDIA RTX PRO 6000 Blackwell Workstation Edition', major=12, minor=0, total_memory=97258MB, multi_processor_count=188, uuid=a243a9ae-a528-c1ce-522c-45c468f481de, L2_cache_size=128MB)

Screenshot From 2025-05-08 11-35-36

ADA vs Blackwell 6000 3dmark speed way

3 Likes

Power Limiting by nvidia-smi -pl 400 works

2 Likes

Blackwell (power limited to 400w)


** Dtype: torch.bfloat16

** Platform/Device info:
Linux flat-blck-io 6.14.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr  6 15:05:05 UTC 2025 x86_64 x86_64
_CudaDeviceProperties(name='NVIDIA RTX PRO 6000 Blackwell Workstation Edition', major=12, minor=0, total_memory=97248MB, multi_processor_count=188, uuid=a243a9ae-a528-c1ce-522c-45c468f481de, L2_cache_size=128MB)

** Critical software versions:
torch=2.7.0+cu128
cuda=12.8

** Additional notes:
benchmark version: 2


--------------------------------------------------------------------------------


Warming up the accelerator for 30 secs ... accelerator warmup finished
^C
Tried  7864 shapes => the best outcomes were:
mean:   327.8 TFLOPS @ 1280x2304x2048 (MxNxK)
median: 327.7 TFLOPS @ 1280x2304x2048 (MxNxK)
max:    347.0 TFLOPS @ 1280x2304x2048 (MxNxK)

geomean: 196.8 TFLOPS for 7864 shapes in range: m=[0, 4096, 256] | n=[0, 4096, 256] | k=[0, 20480, 256]

Legend: TFLOPS = 10**12 FLOPS
Elapsed time: 0:06:42

Ada


** Dtype: torch.bfloat16

** Platform/Device info:
Linux flat-blck-io 6.14.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Sun Apr  6 15:05:05 UTC 2025 x86_64 x86_64
_CudaDeviceProperties(name='NVIDIA RTX 6000 Ada Generation', major=8, minor=9, total_memory=48519MB, multi_processor_count=142, uuid=392e3fa3-a720-e90c-4dc3-1373d48c6f3e, L2_cache_size=96MB)

** Critical software versions:
torch=2.7.0+cu126
cuda=12.6

** Additional notes:
benchmark version: 2


--------------------------------------------------------------------------------


Warming up the accelerator for 30 secs ... accelerator warmup finished
^C
Tried  5581 shapes => the best outcomes were:
mean:   151.5 TFLOPS @ 1280x1792x4096 (MxNxK)
median: 151.7 TFLOPS @ 1280x1792x4096 (MxNxK)
max:    155.5 TFLOPS @ 1280x1792x4096 (MxNxK)

geomean: 95.7 TFLOPS for 5581 shapes in range: m=[0, 4096, 256] | n=[0, 4096, 256] | k=[0, 20480, 256]

Legend: TFLOPS = 10**12 FLOPS
Elapsed time: 0:08:34```
2 Likes

This card is SR-IOV capable right? What if someone wanted to multiplex gaming on it?

Or was it that poopy thing where you had to license it to even have their proprietary form of SR-IOV?

Would be my question as well. Looking Glass client on the host operating system with the display attached to it and a Windows VM with access to a MIG device with the Looking Glass Server installed and then see the performance to have both accelerated GPU on the host and guest from the same card.

It has the same limitations that the other pro series cards nvidia has put out like the Ada, if you want to partition the card you need vgpu license.

However they did fix the MSI bug that required a registry hack to get the ada card working through vfio. So you can vfio pass through the entire card into the vm without any registry hackery which is nice that they cleaned that up for this generation. So a bit of progress for us plebs who don’t want to pay for vgpu.

1 Like

Oh I just read the promotional information again so you can’t use it for the host and the partitions vGPUS for a guest at the same time?

If you pay for vgpu (seperately) you can definitely do that.

1 Like

Even using the RTX 6000 for display output on the host and acceleration in a guest at the same time, are you sure? It read to me like you need to pass through the entire card, but you can pass through a vGPU, or a partition or however you wanna call it, to the VM while having display output on the host?

Theres two ways to do it, the free way which is to have two video cards. One video card uses the vfio driver and the entire card is passed into the vm while the other video card is used for the actual display. Then you have two ways to view the vm, some sort of remote desktop or something like looking glass which does dma transfers between the two cards for ultra low latency.

The other way is a single card approach which allows you to use both the host and vm on the same card and pass in a vgpu into the virtual machine. This costs money for licensing fees

1 Like

The former setup is what I have right now, that I know of well.

Okay so I get that as long as you pay for a vGPU license you can use the RTX PRO 6000 series for both host and guest at the same time. This actually sounds good. I’ve found this document, do I understand correctly that one perpetual workstation licence costs $450? Given the cost of the card and the amount of VM’s I need I think that is bad but not too bad.