Why are professional cards faster in certain workloads?
The quick answer would be “double precision fp performance!”.
But what I am specifically curious about are workloads where even with lower FP64 performance the “pro cards” are faster.
As an example I looked on Radeon Pro WX 4100:
vs RX 570:
vs GTX 1080:
https://manatails.net/blog/2017/03/radeon-pro-wx-4100-review/
Specifically wire-frame performance is a sticking point.
WX4100 has FP64 performance of ~150 GFLOPS
RX 570 has ~300
GTX 1080 has ~250
So the FP64 argument goes out the window…
Some would say that “verified drivers”. But from what I can tell this essentially means no corner cutting when implementing a particular API (OpenCL for example). This usually means that all the corner-cases need to be handled correctly instead of doing something ‘kinda correct’ but enough for gaming. And there are rigorous validation tests for all those cases.
So that argument would mean something opposite: less performance, but more correctness.
Then I thought about silicon itself but for example Wx 4100 is a polaris 10 chip which is also used in RX 400 series consumer gpus.
But it’s possible that some silicon is just locked. But if that’s the case then what specifically is locked? It’s not like some parts of API are missing.
If the wire-frame performance is so good then which API calls are faster? Under what circumstances?
The only detailed info I could find about actual AMD performance tuning are here: Developer Guides, Manuals & ISA Documents - AMD
but that’s mostly focused on CPUs, and didn’t really help with my original question.
TL;DR
Do any of you have any Idea why something like wire-frames would be so much faster on “pro” cards?
And If I wanted to develop my own programs that would target “pro” cards, which API calls / instructions specifically would be faster?
Kinda relevant L1 video: Putting the Radeon Pro WX7100 to Work: Testing (Part 1) - YouTube