GPU Wars: Enter Maxwell, Nvidia's Successor To Kepler

We've seen the rise of Kepler. At first, the GK104 GPU was everything a gamer could want. 1536 CUDA Cores, 256-bit bandwidth, great gaming performance, great overclocking headroom, quieter than the reference HD 7970 at launch. Truly, it was a good card and a great GPU.

It's real performance secret came from another source, though. It's low energy consumption, and thus lower heat output. We saw enthusiasts and fanboys saying that AMD consumed too much energy, and output too much heat. Although the cost of electricity isn't too much or too bad, it's worth noting that heat output can throttle your GPU, damage it, and it can build inside your case if you don't have a adequate airflow (which is a common problem in SFF cases, like mATX or mini-ITX).

Thus, Kepler proved itself to be a reliable and efficient GPU. Enter Maxwell, the next architecture from Nvidia. Right now, the GM107. It's a GPU that sits somewhere between the GK106 and GK107 in terms of size (GK107 is 118 square millimeters, and GK106 is 221 square millimeters, and GM107 is 156 square millimeters).

You can check the comparison here:

http://videocardz.com/49517/nvidia-maxwell-gm107-gpu-pictured-detailed

In terms of performance, at least from this Videocardz.com article, here's what you can expect:

http://videocardz.com/49498/nvidia-geforce-gtx-750-gtx-750-ti-preview-leaks

Now, from what I gather, this is a game changer for Nvidia, and AMD should be scared (unless they've got something they've been keeping secret).

The GK107 in the GTX 650 Ti Boost requires at least 1x PCI-E 6-pin power connector. Given that there's about 75W coming from the PCI-E x16 slot, and 75W from the PCI-E 6-pin connector, that's 150W maximum power draw. The GM107 in the GTX 750 Ti is rumored to have performance between the GTX 650 Ti Boost and the GTX 660, even though it's only using 128-bit bandwidth. The GTX 750 Ti is rumored to not need a PCI-E connector (6-pin or 8-pin) whatsoever, meaning it can run at full performance at just 75W of power draw.

And most importantly of all, in spite of consuming less power and giving the same or similar performance, it's still running on 28nm process here. And that's the scary thing (for AMD).

In other words, they've halved the power draw and increased performance on a smaller die size, which has obvious heat output ramifications... but most impressively, they've done so using only 28nm and while decreasing memory bandwidth from 192-bit to 128-bit.

Now, when we go to 20nm, this will be even more dramatic. Why? Because 20nm will have twice the transistor density than 28nm. (To explain why, the nanometer fabrication process refers to the size of each transistor. 20nm has twice the transistor density as 28nm, because if you have 140nm x 140nm to fit as many transistors as you can, you can fit twenty-five 28nm transistors in a 5x5 configuration, or forty-nine 20nm transistors in a 7x7 configuration.)

Having greater transistor density means the transistors can be moved further apart to increase energy efficiency (to decrease electron leakage, which causes heat output). Smaller transistors also allow for lower voltage, which also helps minimize electron leakage (which, again, helps decrease heat output). It also allows for more CUDA Cores (or Stream Processors, in AMD's case), more TMUs (Texture Mapping Units), ROPs, and more.

And having lower temperatures has a big impact on how high/fast you can clock your GPU, as well as how much airflow you need to cool your GPU (which helps define how quiet or loud your GPU will be under load, but it isn't the only factor). But it's much more important in SFF builds, laptops and mobile devices, where heat output can make or break a system or mobile device (smartphone, tablet, etc).

Given all of this, if Nvidia launches a 4GB model as their launch flagship (like the GK104-based GTX 680 was, until the GK110-based GTX TITAN came along to take the crown), using a 512-bit bandwidth, we can assume that it's going to be a beast of a card when 20nm is released.

There's reason to have a lot of hope for Nvidia right now, assuming all these rumors about 20nm power consumption and the specs/release/performance of the GTX 750 Ti (on 28nm) prove true.

On a side note, I find the GTX 750 Ti launch to be very plausible. Why? Because the GTX 650 Ti Boost and GTX 650 Ti Non-Boost are virtually out of stock almost everywhere I was looking, and many aren't being re-stocked. That can mean either Nvidia is either going out of business, they don't want to make money by selling people their graphics cards, or they're preparing a new GPU for launch. And unless you've been living under a rock for the past 10 years (Patrick?), you'll know the the former two aren't realistic in the slightest, making the latter the most plausible explanation.

And that's why AMD should be hoping Nvidia doesn't come out with guns blazing at launch, offering a huge GPU die size, huge memory bandwidth, lots of high-clocked VRAM, and more. Because if they do, AMD is going to be in trouble unless they've got around to fixing their heat output. Obviously, AMD has long been aware of their heat output woes and terrible reference cooler design, so I'm hopeful AMD will talk to one of their partners to see if they might consider designing a reference cooler that uses multiple fans (like the HD 7990 did). That could help alleviate some of the noise and heat issues.

When 20nm comes out, it's going to be a very interesting launch. We might only see 20nm in Q3 (July 'til September), but when it comes out, we'll probably be seeing some very impressive stuff.

On a side note, TSMC (Taiwan Semiconductor Manufacturing Co.) has announced they're in mass production of 20nm chips. Meaning we might just see 20nm coming out soon, or at least we'll see leaks coming out soon (I wonder if the leaks will appear on chinese forums first? hhmmm). You can check that article here:

http://www.xbitlabs.com/news/other/display/20140116220015_TSMC_Begins_Volume_Production_of_Chips_Using_20nm_Process_Technology.html

You can also check out WCCF Tech's slightly older post (3 months old) here, describing AMD trying to come out with their new 14nm GPUs in H1 2014 (meaning before June is over... oh please, by the Bethesda Game Studios gods of gaming, let this not be like the launch of Mantle-enabled drivers) :

http://wccftech.com/tsmc-begins-volume-production-20nm-chips-q1-2014-16nm-finfet-chips-q1-2015/

Anyways guys, I hope this is a good blog was a good read. And I don't want to sound like a fanboy for Nvidia (I do love AMD, and I always like to cheer for them, but from the leaks, I have to say there's more reason to be optimistic for Nvidia). So, I'd love feedback on this. (Did I leave out anything important? Should I include any other links? Should I have added more historical context for die shrink launches, such as 55nm to 40nm, and 40nm to 28nm? And should I have mentioned performance difference at launch, and then after a few years after the drivers had been optimized for those GPUs, tgheir architecture, and their bugs have been ironed out?)

interesting....... I find the fact that the if 750 ti rumors are true, it sure might lead a way for the higher end cards. but right now, they seem to also kind of be like intel; drag out new products, overprice it.(slightly, and not because crypto currency, but it's actual retail price) When we can find a true mainstream 20nm card that draws low power, low heat, etc. it might be quite a while. They have finally managed to grasp the 28nm process as shown, but will this possibly come to use if they suddenly switch to 20nm?

I find the fact that nvidia is truly not getting the high resolution market, as low memory bandwidth (such as 192-bit or 128-bit) wouldn't be adaquate for higher resolution. This is often shown in 4K tests with the r9 290x (high memory, 512 bit bandwidth) and the 780 ti (slightly lower memory, only 384-bit), and mainly because of the bandwidth reason. Bandwidth doesn't matter than much in lower resolutions, so we see that the 780 (ti) often destroyed the r9 290(x) during 1080p (sometimes 1440p) resolutions.

Lastly, there was one important factor that could change this all: mantle. Mantle seems to be "slightly" affecting the performance of games, and in a good way. This could potentially be the hidden blade of amd and could easily crush nvidia's plan of any type. If they can also reduce the heat, power, noise, etc. Then they could potentially match nvidia with a much lower price. Mantle would most likely come to play as something that would hold AMD all together in the market, and even early 20nm cards might not be able to grasp the power of the GCN gpus. Now all AMD has to do is play it's cards right. Don't we all remember the early hype when the ps4 and mantle announcement first appeared? "sigh", that could have been a "true" killer, but the actual performance is about as close as when they first advertised it. It's come a far distance from almost a year ago, but it still needs polishing. Definitely. And if AMD actually listens to what we people want (e.g., better cpu processes on our APUs? 8 core apu? etc.), then they can easily control the market.

Just my 2 cents.

You did not mention ARM processor

? gpu's don't use ARM bro.

nvm im so uneducated. thanks xD

Agreed, in part.

Some things I'd like to clarify: Mantle is not going to optimize the graphics side of things, it's mainly there to distribute the CPU load onto more cores, if game developers code their applications to use Mantle, then they don't need to worry so much about multi-threading.

Nvidia did take a beating duw to lower bandwidth at higher resolutions. But they also did a lot better at avoiding lower minimum framerate, meaning there's a lot less stutter on the Nvidia side. CrossFire and SLI woes have been dealt with, fortunately, and due to the importance we've seen about this, it's likely both teams are working to avoid this issue cropping up beforehand.

Nvidia does have G-Sync, but it's not going to be a compelling reason to buy a GPU unless you have a monitor that supports it - and at launch, it's going to be rare and overpriced for what you're getting, meaning it's not going to add any value to the GPU, but rather take value away, since you could get 60fps with a GPU that costs 100$ less, using the same (or a similar) GPU.

Although higher resolutions and more VRAM help when running at higher resolutions, we still have to wait. 4K isn't going to be anything more than marketing until 4K hits the 300$~400$ range, and then it'll take off. However, what the Oculus Rift will offer is going to be very compelling as well, possibly more so than 4K monitors, and that's at a time when 4K monitors might be as costly as an Oculus Rift. So that's a sort of curveball as to whether or not higher bandwidth and more VRAM is going to be as important by late-2014 and early-2015.

We do need to wait and see if TrueAudio is going to be that amazing that it'll be worth not buying an Intel CPU, just for that feature. But then again, if Kaveri supports it already, you don't need an AMD GPU, you can get an AMD CPU and Nvidia GPU, or an Intel CPU and an AMD GPU. Either way, you still would have it unless you don't include AMD on both GPU and CPU.

And yes, Nvidia does seem to have (finally!) mastered the 28nm process with this, if rumors prove true. They might take longer to master 20nm, given rising costs and diminishing returns, but we might see Nvidia launching a revamped Maxwell in mid-2015, or later even, with slight optimizations to improve clockspeeds, lower temps, driver optimizations to improve SLI and single-card gaming performance, and more.

We live in interesting times, my friend. And although it may be an old chinese curse, it's a modern-day blessing for gamers. =)

http://www.xbitlabs.com/news/cpu/display/20110119204601_Nvidia_Maxwell_Graphics_Processors_to_Have_Integrated_ARM_General_Purpose_Cores.html

I believe this is the article you're referring to. The ARM processor included is supposed to be used not for video performance, but for managing the memory and for general processing/compute applications. This was touched upon by Jen-Hsun Huang, Nvidia Chief Executive at GTC 2010 in this quote (from the article above): "Between now and Maxwell, we will introduce virtual memory, pre-emption, enhance the ability of GPU to autonomously process, so that it's non-blocking of the CPU, not waiting for the CPU, relies less on the transfer overheads that we see today."

This article also specified that ARM cores aren't going to be used in lower-end GPUs (like GTX 750 Ti, for example). The exact quote from said article is: "General-purpose processing cores will bring mosts benefits for compute applications and therefore Nvidia may omit ARM from low-cost designs." - X-Bits Labs (Written by Anton Shilov)

Although this does tie in nicely with Maxwell, it still doesn't offer any benefits we can measure until we get more information about GPUs with ARM cores inside. We still don't have this information. Although they did mention they'd introduce ARM cores between now and Maxwell, it's also mentioned it wouldn't be in lower-end GPUs. Also, it's still (at least for me) unclear whether this is something we'll see on the Desktop side of things, or if this is going to be for their Tegra mobile line of products (like Nvidia Shield, for example, and several high-end and mid-range smartphones launched with Tegra ARM processors in the past). EDIT: It could also be that Nvidia might be planning to release this in Quadro professional video editing cards, or maybe Tesla compute cards, so it might not even arrive on the GeForce desktop side of things for some time.

I might add that in, but it's more of a side note rather than a major topic. If I do see evidence of ARM coming to the desktop, and I see performance numbers and leaks, it'll be worh adding for sure. But right now, I haven't seen any leaks about ARM included in Maxwell. If those leaks do pop up, I'll add it for sure.

one thing that i find is most likely very important that many people and many people don't realize until the end: what do we actually get a gpu for? to play games. How does this affect opinion? well for starters, we seem all too focused on specs, and other nerdy stuff. But I think what is very important is the experience, the offering to consumers at this time, and the opimization of games with the particular drivers. Nvidia has opimized it very deeply and heavily, for example, geforce experience. For most of us nerds that like to tweak our games, change our settings, etc. it isn't such a big deal. But some people are switching from consoles, and they don't like messing with settings. This seems to come to play very much. And we see that geforce experience doesn't end there: oem's also have it, so the people that don't build also enjoy a easier experience. (amd's raptr was not brought up here because from personal testing it doesn't seem mature enough to compete with geforce experience, such as low support for many games.)

What im trying to say here is, i think for some things rumors, next gen 20nm hardware, new advancements, don't seem to matter much for certain consumers, and even these consumers don't seem too small in size. There are many people (not me) that don't care what settings, and are fine with med to high settings at possibly 1080p. I think if APU's are able to give a little bit more uuph, or intel hd graphics, then the average consumer would be plenty happy with their integrated graphics and they wouldn't even have to need a dedicated graphics card.

But moving away from that topic: for us actual nerds on teksyndicate, i see that other than integrated graphics, the next amount of gamers use dedicated hardware in the 200 dollar zone, or for most people, the golden spot for graphics cards. I think a while ago there was high competition in that range.  But now, underpowered cards like the r9 270x and the soon 750 ti will come in, and will actually have weaker performance than their previous siblings, notably the 7950 and the 660/ti. And when i say weaker performance, i mean similar performance at the exact same price range, even after almost half a year. Think about it: the 7950 was 200 bucks, and was easily the best card ever. But with rebadges, the r9 270x was also 200 bucks with -1 GB VRAM and 256-bit memory width compared to 384 bit, and that will affect performance soon, as the card is also a tad bit weaker. What does this mean for us? that we are getting ripped off for our moneies. The only thing that's preventing people form ripping their hair out is no overprice from the crypocurrency club, as most people don't mine with an r9 270x and under, as the performance is so low that breaking even would most likely never happen, if not losing money.

Just 2 more cents :)

1. Google: maxwell arm

2. Read articles.

3. Get educated. 

I think it might be used for CUDA applications or physx. Actually this ARM processor confuses me. I have no idea what to think about it

Edit:
You said "article specified that ARM cores aren't going to be used in lower-end GPUs" but author did not say that. He only said it might not be and it is only his opinion.

Well, the thing is that the ARM processor is mainly there when/where you need things which CUDA Processors aren't good at. CUDA was designed for gaming and 3D, which is nice and dandy, but it doesn't have very good OpenCL performance - at least compared to AMD, that is.

It's mainly going to be used when you need single-threaded performance, but you aren't looking for CISC (complex instruction set computing), and risc (reduced instruction set computing) is all you need. It can still be parallelized, it's just better at using less energy and delivering more performance per watt, which is what supercomputers are looking for. That's one application.

CUDA does a lot of things with PhysX, and it works with CUDA-enabled applications and languages. But that's not why ARM with GPU is interesting. It's the fact that the GPU can use an ARM processor to give it some independence from CPU lag and hiccups, meaning it can do it's own thing more easily. Basically, it allows the GPU to continue to spit out frames even if the CPU is already tired, meaning the ARM core can manage the memory without worrying about whether or not the CPU is overloaded or not.

That means your task manager can have your CPU running at high usage, and not decrease graphics performance as much as it otherwise would have. So it's kind of like Mantle, in that sense, but the difference is that it doesn't spread the load over all cores as easily, it just means if there's CPU lag or if the CPU is being used too much, the GPU can continue to work regardless of that. So it's mostly going to be for memory controlling. (Was I clear enough, or are there other questions regarding this?)

im tired of the term "game changer' lol

G-sync was suppose to be a game changer but it's incredibly cost prohibitive. and turns out it's VBLANK is going to be an open standard anyways so nvidia is charging people for something people will get on new monitors for free(maybe even current monitors with a firmware update).

Well, agreed in part. HD 7950 was 200$ only because there was a fire sale (everything must go!) on those cards as the new R9 280X was coming out. It was originally selling for 350$ to 250$, depending on where you looked before that sale begun, but still, the R9 280X is better than the HD 7970 Ghz, for around the same price.

We are getting better performance in the high end, per dollar. But the 120$ to 250$ range has been dominated by low-value cards. R9 270X is a good example or something that's overpriced, same with GTX 660 and (eventually) GTX 750 Ti.

And yes, the average consumer won't tell the difference, and integrated graphics might end up being better-performing and better looking than console games, even on AAA titles. (Imagine when a NUC can play games as good as any console, for a fraction of the price! I think I'll be rolling on the floor for days, laughing until I'm blue in the face.)

GeForce Experience is great. I love it. It works wonderfully, and it's never failed me thus far.

I think we'll have to wait until 20nm comes out. But hopefully graphics companies will show us real value, because as of right now, there's no real reason to upgrade your GPU. You're paying more for the same, and thus getting less for your dollar. So it's not an upgrade, it's a price hike disguided as a new product. So I vote with my dollar, and I'm not going to give my hard-earned cash to indulge in the greed of corporations and big suits at the top. So I'm holding out until I see a deal or a game-changing product. And right now, there isn't one. So my money is being saved until something better is out.

Yep. G-Sync is going to be a big deal, but AMD came out with something free.. sort of. Because it works well on Notebook screens because of eDP (embedded displayport). Notebooks don't use scaler chips, which is why FreeSync works. Desktop monitors need scaler chips, though, so Nvidia came out with a scaler chip that has VBLANK support enabled, but only when used with their GPUs.

What I'm curious about is if a GPU could have something similar to a scaler chip build in, and then just send the scaled image to the monitor afterwards? I'm sure that the information needed for the GPU to do proper scaling could be stored on the monitor as read-only memory, the same way my monitor and TV store the model name and manufacturer even though I'm connected via VGA (D-Sub) to my monitor, and via HDMI to my TV. So I'd like to know if GPUs could serve as scaler chips for the monitors they support, so the barrier to using VBLANK would go away, and so the GPU could take on more of the load?

"Mantle is not going to optimize the graphics side of things"

I smell bullshit.

StarSwarm 1080p @ Medium settings with an i7-4960X and Radeon R7 260X:

RTS test DirectX: 13.95 FPS (Unplayable!)

RTS test Mantle: 31.69 FPS (126% faster)


thanks for the article rsilverblood.

Wait one second... isn't Star Swarm a demo that particularly focuses on showcasing the performance of Mantle? And if that's the case, isn't it rather logical to assume that a demo for Mantle performance might not be optimized for DirectX out of the box, and that said performance might not be what we'd expect to see in real-world gaming (like Battlefield 4, Star Citizen, and others?)...

Thanks. =)

with amds current heat issue...

It's the 290 and 290X card. or the Hawaii chip as we know

The problem i see with the card is not the heat output per say,.. We have coolers that can dissipate this heat and keep things operational but they're bottlenecked by thermal dynamics. The hawaii chip has alot more transistors per sq nm/mm. This means more heat in one area.

Hawaii~~   GPUs: r9 290, r9 290X

6.2B transistors   | 438 mm^2   =   14.155M transistors per mm^2

Tahiti~~    GPUs: HD 7970, r9 280X

4.3B transistors   | 352 mm^2   =   12.216M transistors per mm^2

GK104~~  GPUs: GTX680,770

3.54B transistors | 294 mm^2   =   12.04M transistors per mm^2

GK110~~  GPUs: GTX 780, 780ti, TITAN

7.08B transistors | 561mm^2    =   12.62M transistors per mm^2

From what i can recon most high end cards have a little of 12M transistors per mm^2 and assuming the heat output of each one is the same (clocks and voltages are important but all very close) I'd say the hawaii was bound to be a hot card.

Heat can only transfer from one area/surface only so fast. and modern copper coolers with thermal grease don't cut it. there is just too much heat in one area.

Good read rsilverblood  | well written :D +1

EDIT: I messed up some numbers, no biggy just a notation error.

My opinion of Vsync is tantamount to WHY THE FUCK.....

Why don't we have variable framerate technology as a standard? All I've heard from it is good things.

  • lower power consumption from panel (less frames/work a second potentially)
  • better viewing experience for obvious reasons
  • something else I'm forgetting atm

Why Nvidia is making a propriety system is beyond me (wait no it's not they're a greedy company) I don't care what form it comes from i just want VFT. and not for a dramatic cost as it should be 'a thing'

Now lets look at GPU still spiting frames when main CPU is busy from developer standpoint. It would be mess to implement such thing and i don't think it(ARM CPU) will ever be used in that way. There should be good reason why Nvidia is planing to put weak CPU on GPU. Maybe some visual effects can be enhanced by just little touch from ARM CPU.