No the instructions would be implemented in hardware with transistors. You should probably think of the harware similar to software. In that a program does a fuction like chkdsk checks you hdd but the program is made up of smaller instructions like add multiply and so on. Hardware is kind of similar in that you have a processor that performs an instruction from the software but on the hardware side of things that instruction add or multiply is made up of smaller instructions performed by transistors these smaller instructions are binary on or off.
So lets say that to make one core takes 50k transistors but it only has 4 instructions - + * and / and you have another 50k transistors sitting at your work table. - + * and / are enough to do pretty much anything you want like in my last post breaking
2 ^ 3
into
2 * 2 * 2
If you where amd you would use your extra 50k transistors to make another core.
If you where nvidia you would use say 25k to implement a power instruction and the other 25k to implement a sqr root instruction.
In this case both gfx cards use the 100k transistors but the AMD has 2 cores where the nvidia only has 1. So its not so much nvidia can do more with less but its more nvidia can do more with 1 core than AMD can do with 1 core. Lets pretend the two cards can do 1 instruction per second and we feed them both 2 ^ 3.
The nvidia card would use its power function and finish in 1 second.
The amd would break it into
2 * 2 * 2
Making it 2 multiply instructions and then sends each instruction to each of its cores simultaniously. the cores run at 1 instruction per second and thus finish at the same time in 1 second.
I know you cant really multiply like that to get a power in real life but just to keep it simple lets pretend we can, in real life the equasion is broken down and ran simultaniously but the equasion is more complicated so it actualy can be broken up into samller steps that run at the same time.
Now you can optimise in software too for example if we where running on these pretend cards when the software runs on amd we would want to feed it 2 * 2 * 2 to make use of both cores where as for nvidia we would want to give it 2 ^ 3 to make use of the extra function.
Also again its worth pointing out im useing basic arithmatic for these examples in reality both cards have power instructions and so on its probably much higher level arithmatic that amd doesn't bother implementing like the phong example.
I'm still confused about the hardware of it. :I
Yea im not explaining it very well because I am confused about it too this isn't based on information i have found its just what i think sounds reasonable based on my understanding of how hardware works and the specs of the cards. For example if my theory is correct it would mean that both cards would use about the same amount of transistors just nvidia use them for extra instructions and amd use them for extra cores how ever the nvidia has significantly less transistors and im unable to explain that.