GPU FLOPS What's wrong here?

FLOPS (FLoating-point Operations Per Second) are a mathematical way to measure the performance of a computer.

So we're all aware of AMD vs. Nvidia and framterates this and benchmark that. But what about FLOPS?

 

The GTX 680 has about 3090 GFLOPS of power

source: http://bit.ly/VORpi1
 

The HD 7970 has about 3788 GFLOPS (4300 Ghz edition)

source: http://bit.ly/VpKVGW

 

These cards have VERY similar performance.

 

But how can an AMD card have 122% (140% Ghz edition) the amount of FLOPS but perform relatively equal?

Can someone explain what's going on here?

EDIT: [fixed links]

 

It never comes down to just FLOPS and Mhz aswell as memory it all comes down to how effiecent the graphics processor is aswell as its ability for the Vram to communicate with the Processting unit but FLOPS and Mhz do play a role just not the biggest one.

nvidia are just winging it in a manor of speaking with superior architechture so that they dont have to cram as much hardware inside, to get simalir performance in general but obviously not always as good or better in certain areas, this also explains nvidia superior power comsumption compared to amd equivalent competing cards

Is this the reason most bitcoin machines use AMD card? (more raw flop power)

it does not come down ot frequency  because in the mathematical equations to get flops it's

/or/

(source: https://en.wikipedia.org/wiki/FLOPS)

so if you increase the clock (over clock) you get a direct performance increase to the increased flops count

 

And as for " how effiecent the graphics processor is" that's exactly what i'm saying AMD cards seem to have more flops compared to a similarly performing nvidia card. but the amd card can compute 122(+)% MORE operations per second. How can ones crunch numbers better but not show it.

That's what i'm asking.

AMD has more cores and thus more flops when it comes to raw power AMD wins. Nvidia is more optimised for maths that is commonly used in rendering games so it can do more with less getting the same fps. Because of this as you say bitcoin users use AMD as the maths for generating a bit coin is diffrent from rendering 3d gfx so all optimisations count for nothing and you have to use raw power and when it comes to raw power where AMD has an advantage. As far as I know this would also extend to physics simulation and video rendering but this would depend on how well nvidia is optimised for those tasks and I dont know of any benchmarks.

As far as how nvidia optimise i dont know and i belive that it would probably be too long and complicated to explain but I guess it would be something like this.

Lets say you want to do 2 to the power of 3 you could do it like

2 * 2 * 2

thats 2 multiplications and thus 2 FLoating-point Operations. But if we implement a power instruction into our architecture that could be

2 ^ 3

Thats 1 power and thus 1 FLoating-point Operation we have halfed the number of operations we need and thus the amount of work.

In real life the optimisations would probably be more specific than that so you could do something like the equasion for phong shading is used in pretty much every game and its used alot so here is one of the eqasions it uses:

max(0, N.H)^S

Thats 3 instructions now if we could break that down into just one instruction with 3 paramaters like

phong(N, H, S)

Then you reduce the amount of code the gfx has to run.

 

Again I have no inside knowledge of how nvidia get more performance from less power this is just a guess.

 

I see what you mean, but isn't that the differnece in softare? (different more efficient ways to get the same #)

I'm still confused about the hardware of it. :I

No the instructions would be implemented in hardware with transistors. You should probably think of the harware similar to software. In that a program does a fuction like chkdsk checks you hdd but the program is made up of smaller instructions like add multiply and so on. Hardware is kind of similar in that you have a processor that performs an instruction from the software but on the hardware side of things that instruction add or multiply is made up of smaller instructions performed by transistors these smaller instructions are binary on or off.

So lets say that to make one core takes 50k transistors but it only has 4 instructions - + * and / and you have another 50k transistors sitting at your work table. - + * and / are enough to do pretty much anything you want like in my last post breaking

2 ^ 3

into

2 * 2 * 2

If you where amd you would use your extra 50k transistors to make another core.

If you where nvidia you would use say 25k to implement a power instruction and the other 25k to implement a sqr root instruction.

In this case both gfx cards use the 100k transistors but the AMD has 2 cores where the nvidia only has 1. So its not so much nvidia can do more with less but its more nvidia can do more with 1 core than AMD can do with 1 core. Lets pretend the two cards can do 1 instruction per second and we feed them both 2 ^ 3.

The nvidia card would use its power function and finish in 1 second.

The amd would break it into

2 * 2 * 2

Making it 2 multiply instructions and then sends each instruction to each of its cores simultaniously. the cores run at 1 instruction per second and thus finish at the same time in 1 second.

I know you cant really multiply like that to get a power in real life but just to keep it simple lets pretend we can, in real life the equasion is broken down and ran simultaniously but the equasion is more complicated so it actualy can be broken up into samller steps that run at the same time.

 

Now you can optimise in software too for example if we where running on these pretend cards when the software runs on amd we would want to feed it 2 * 2 * 2 to make use of both cores where as for nvidia we would want to give it 2 ^ 3 to make use of the extra function.

Also again its worth pointing out im useing basic arithmatic for these examples in reality both cards have power instructions and so on its probably much higher level arithmatic that amd doesn't bother implementing like the phong example.

I'm still confused about the hardware of it. :I

Yea im not explaining it very well because I am confused about it too this isn't based on information i have found its just what i think sounds reasonable based on my understanding of how hardware works and the specs of the cards. For example if my theory is correct it would mean that both cards would use about the same amount of transistors just nvidia use them for extra instructions and amd use them for extra cores how ever the nvidia has significantly less transistors and im unable to explain that.

Yea, it's heavy stuff.

physx.....

I knew some one would bring that and after effects up.

physx isn't what im talking about when i talk about physics simulations I ment figureing out exactly where the moon will be in 50 years rather than making a red ball look like its bouncing. With that I was talking more about OpenCL on an AMD card vs CUDA on an nvidia it used to be that the AMD would just destroy the nvidia. Thats based on rather old tests but I havent seen anything to the contrary since then. But yes since you cant run cuda and physx on an amd card obviously it will run better on an nvidia but if you run an actual physics simulation or video rendering on OpenCL it will probably destroy CUDA.

I thought you were referring to gaming. AMD does destroy nvidia in acceleration. sony has benchmarks from vegas, and AMD is around 20% faster than nvidia.

Then how are people able to run physx on amd cards?

https://www.youtube.com/watch?v=O_VE5yqNbLg

Sorry for all the questions, but i kind of want answers.

EDIT: [grammar]

Then why don't we hear anything about this kind of thing?

amd cards can run physx, its just they're terrible at it in game as well as out. they take to much of a perfomance hit with physx as it's optimized for cuda

Its not actualy running on the amd gfx card its using their cpu to do the work that is normaly done on the nvidia gfx card.

Because people only cares about games. The average user is never going to use GPGPU.

The reason people mine buttcoins with AMD is because of the huge GPGPU in the 7xxx series. Nvidia did something similar with 4xx and 5xx but decided to scrap it with the 6xx series because most users wouldn't use it. That way they got a power efficient chip that almost performs as good as the 7970, just as the 5870 performed compared to the 480.

ok, so FLOPS is how many arithmatic calculations the card can do on floating point variables per second. so its just how many cores the card has (or more exactly, the ammount of floating point arithmatic logic units the card has). flops is also affected by the cards memory layout, but thats minimal compared to the number of FPALU's it has.

i dont want to go into it to heavily, but openCL works much better on amd cards as well. so in the end, for calculating data. amd cards win (other then the teslas, but thats not rly a graphics card)

if you're going to compare Nvidia and AMD, be fair.

it should be best workstation card vs. each other.

S10000(amd) vs K10(nVidia) (the K10 has more flops than the K20x because of it being a dual core)

in this case top of the line workstation card AMD still has Nvidia beat.
about 5.91TFlops vs. 5.3TFlops