nVidia Redemption

So nVidia has a bit of a problem with their hating open source, but they have found a new partner in the linux world, and what a partner: Red Hat, the biggest linux kernel contributor and the most used full featured computing operating system on the world (RHEL for most enterprises in the world, RHEL/CentOS for most servers, RHEL/Fedora/Scientific for most scientific research computing systems, Pidora for the RaspPi, etc...), and also the largest open source based IT-company in the world, has struck a deal with nVidia to help them develop a similar technology to the Mantle API from AMD.

The good news is that it's going to be based on IOMMU, just like the AMD Mantle solution. The bad news is that the AMD Mantle solution provides it's own dedicated IOMMU pipeline and tables on it's 7/8/9 series GPU's, which makes them compatible with any motherboard with any chipset and CPU that has a normal PCIe controller (so only Asus Z77/Z87 and RoG motherboards are basically incompatible with Mantle, but everything else works), but nVidia can't implement such a solution, because their hardware doesn't have this feature, so they will use the system's IOMMU, which means that the enhanced low level API that is being developed for nVidia by RedHat, will only run on all AMD systems after AM3/Phenom II, and on Intel systems that have a compatible motherboard (normal PCIe controller, no Asus PCIe controller), compatible chipset (B75/87-H77/87-X79), compatible BIOS (no UEFI, only legacy BIOS systems), and compatible Intel CPU (only non-k Intel Core models).

Source:

http://www.x.org/wiki/Events/XDC2013/XDC2013JeromeGlisseUsingProcessAddressSpaceGPU/xdc2013-glisse.pdf

Interesting...

What this will boil down to is who can attract more developers to develop on their platform. I predict AMD winning this however with their domination of the current gaming landscape. Unfortunately, I'm not sure how well Nvidia will be faring in the not-to-distant future. 

Everyone was all "oh yah, CUDA!" and I kept saying that a closed standard is never good... Well, look where that's gotten them now eh?

Very interesting, indeed.

This is a linux only thing apparently and it looks like NVidia is going to use it with their proprietary driver which means absolutely no improvement for the open source driver. I simply won't buy hardware which lacks free drivers.

Man, this is going to be a brutal war.

I think that it's poetic justice lol: nVidia has been hating on linux for so long, and ironically, linux will be the only platform nVidia will still be competitive on.

Good move by RedHat though, they win the jackpot: 1. they get to learn about graphics cards so that they can improve compute performance of linux and so that they can provide a good solution for their customers that use nVidia cards in workstations; 2. they prevent AMD from abusing it's technological advantage to lock down software in linux, and prevent nVidia from doing something dumb with it's drivers in linux; 3. direct access from the GPU to storage and system memory (IOMMU) is a general purpose solution that could be the last hurdle to finally assimilate dedicated graphics cards into the linux system, instead of the special status they have now, and that would benefit compatibility, would avoid driver problems, would improve scalable performance, would make SLI/Crossfire useless because graphics cards would be infinitely scalable anyway, but would mainly just be multi purpose mathematical coprocessors and I/O cards.

Yeah, but nVidia is publishing hardware specs, we'll see... anyway, this is a good move by RedHat, because it prevents AMD from monopolising the graphics thing on linux, keeps all options open, keeps linux free, etc... and RedHat really needed a solution for its customers with nVidia cards, they have contractual 24 hour solution deadlines, unlike Microsoft or nVidia or AMD or Intel, they need to offer solutions, and what better solution than to start using IOMMU in the same way as it's used in supercomputers, leaving behind the most important bottleneck of the personal computer platform. It's good for people with AMD systems and Socket 2011 systems, but very bad news for people that have bought premium "gamer" hardware, because most typical "gaming" hardware systems don't have IOMMU (VT-d), and won't be able to use this solution. The good news is that this means an extra lease of life for old Intel DualCore and Core2Duo/Core2Quad systems, which don't have VT-x, but do have VT-d, and the RedHat solution means that the CPU's strength suddenly becomes a hell of a lot less important for most consumers and private PC users. For games, this means that with nVidia cards, a Core2Duo system with VT-d and an nVidia card in linux will outperform a 3770k or 4770k system, because most of the performance comes from the graphics card. It also means that a 400 USD laptop with an Intel i3 or Pentium or Core2 CPU and a PCIe expansion slot, will be able to outperform a 4770k system on an expensive Asus RoG or Deluxe board with multiple graphics cards and a Z-chipset, because the k-CPU, Z-chipset, and Asus PCIe controller all make IOMMU impossible, whereas the cheap laptop has IOMMU on the PCIe expansion slot and the performance benefit in games from being able to low level access storage and memory directly by the graphics card gives a factor 10 to factor 100 framerate benefit. I'd say Intel has a serious problem now, they have been cocky - just like nVidia - and have lost the big picture. With Intel being close friends with RedHat in the Linux Foundation, Intel is probably scrambling right now and in a year or so, I'm sure they'll produce something totally cool.

Moral is: if this comes through, it means a huge performance benefit in linux for all graphics cards on VT-d/AMD-Vi/IOMMU enabled systems.

Who do you think will come out on top Zoltan? Because as it was referenced before, this looks to be like it may be a brutal war if implemented right.

To be honest, I don't care. I think this new RedHat technology for nVidia cards is not backward compatible with existing nVidia cards, and if it is, it would only be on Quadro and maybe Titan cards in my opinion, because RedHat has an obligation towards its customers to provide a working solution with nVidia cards, and there will be very few RHEL customers that use consumer nVidia cards, so I don't think they will even bother. It's also a fact that IOMMU is required for this, so most Intel-based "gamer" systems are not compatible. That pretty much makes this technology something for the future, and not something current users will get any benefit out of.

On the other hand, there is little to no official information about AMD Mantle, except that it is compatible with the 7/8/9 series, so it is backward compatible with the current consumer series cards, but there is no information on whether there are other system requirements, the main question being whether this will work without IOMMU, and on Intel CPU and chipset systems.

The present reality is that linux graphics driver development is about one year behind on windows graphics driver development, and although they're working on bridging that gap, it could have been don already, so it means that they're really focusing on next gen technology like Mantle and the RedHat system on the linux platform. I've heard from linux devs that the performance with the new RedHat system is 10 to 100 times faster than anything on Windows, so it's impossible to blame the devs for leaving the old ways behind and focusing on the new ways.

So who comes out on top: not the consumer, because chances are really high that new hardware will have to be bought, and that expensive owned hardware will not be able to support these new technologies. My guess is that AMD will have a better chance for a flying start over nVidia because their technology is both WIndows and linux compatible and probably has the best compatibility on existing hardware, but that in the end, the RedHat technology for nVidia will bring better performance, and after a few years, people will migrate anyway. Of course, if the AMD Mantle API is really open, and it can be universally combined with direct IOMMU access like RedHat does for nVidia, that means that there might still be a long term performance benefit for AMD, but AMD have one disadvantage: VRAM addressing in compute applications. This is an issue that seems impossible to be solved, and I don't know if it's going to be possible to be solved by using the Mantle API, although it might.

One thing is for sure though: there will be a radical shift in the way GPUs are perceived. They will be seen more as compute plug-ins, as FPU coprocessors, as a way to expand a system. Also with the whole gaming and productivity software world moving to linux as preferred platform, scalability will become very important, a linux user that buys a graphics card, sees it as a math coprocessor and a soundcard and a graphics adapter in one, and system integration and full scalability of the graphics hardware has been hindered by the driver situation in linux, and that's going to change radically with these new technologies. It will also mean that everyone with a new PC will have a supercomputer by definition in their hands, and that might change the software industry radically, because people will want to use that power. It will have serious consequences for communication and networking technology, it will make the cloud a bottleneck technology even more than it is now, it will mean that the devices industry will substitute the present PC industry completely and the new PC industry will move up to a new unseen level, with less users, but much more power, and more local applications. In the end, it's all about what users can do with the technology. Technology is great, but it's the applications that bring the added value.

I've made the analogy with the early 90's before. I think it's very much like that. I think that the IT industry will be revolutionized and a new IT industry will emerge, completely different from everything that has set the trends in the last 20 years. Technology always has a hate-love relationship with society. In the early 90's, the typewriter was substituted by the PC, at a time where it was still the norm to send a job application letter in handwriting by snail mail. The same thing is happening now: technology is moving on to private supercomputing, at a time where there is no legal structure or social acceptance for such a thing. So the existing PC realm will be completely usurped by the devices market, whether mobile devices or TV-set top boxes, and the PC world will reinvent itself. In the early 90's, the GUI did the trick, now, compute power will probably make the difference. The question is, what applications will make the new PC technology interesting enough for users. Gaming is one thing, but it's only a small portion of users with not much market influence in relation to the enterprise world. You can't make graphics cards for gamers if you don't develop the technology for enterprises. The reason the Titan exists is because the Quadro exists. The reason the new PC exists may be because people have devices that need an application server, so users need their own "mainframe" supercomputer at home or in their office to drive their devices. I also think that people will get used to real-time video transcoding. Don't forget that Youtube doesn't play fluidly on 7 year old systems. Rendering video is still very time consuming nowadays, even on very fast systems. The reality is that video card and CPU manufacturers advertise acceleration technology for video rendering, but there is no really interesting technology out there: if you use Intel's QuickSync, it renders faster but the pixel quality sucks balls, if you use nVidia CUDA, the static quality is OK but frame refreshes are met by a lot of artefacts, and in you use AMD OpenCL, it's slower than nVidia CUDA (but still faster than Intel QuickSync) because the focus is on image quality. In the end, none of the existing technologies is really that sexy for a videographer, and for serious productions, everyone ends up rendering on CPU anyway, and investing extra in CUDA cards is just throwing away money. I like Wendell's idea, expanding a dual Xeon with a couple of Phi cards for maximum CPU rendering power, that actually makes a lot of sense, it's a valid substitute for a cluster or an extensive AMD-GPU-array-based rendering system, that provides image quality and speed at the same time. If it runs on a really bleeding edge fedora install, it might even already be able to render in real-time. On the new systems, with the new graphics card technology, video rendering and transcoding could be possible in real time and with good quality. This is going to make a lot of people make their own videos. On linux, there are video formats that use less than 1/10 of the storage of compressed video formats on closed platforms and offer a better quality, but the processing power is needed to make efficient use of high compression algorithms.

In the next decade, it will all be about knowing about linux and computer internals. People with linux skills and the logic and mathematical skills to deal with advanced computing, will come out on top in the industry. Right now, "computer geek" is a marketing thing, and is mostly about commercial consumer hardware, but that has nothing to do with the reality of technological evolution. I foresee a radical move away from "computer geek" marketing strategy, a demise or serious step back of some major players in the hardware manufacturing world, and a new "elite" of computer users that really know what it's all about, just like in the late 80's/early 90's. The new PCs will have much greater power, but will be much harder to operate, they will require much more advanced computer skills. So to answer your question: people that invest in profound knowledge will come out on top. Lame answer, but that's what I think.

"Asus uses a PCIe controller for it's Z-chipset boards that can't handle address translations, and therefore blocks IOMMU, and that's not the only problem, the PCIe controller Asus uses on those boards also causes a 10-30% drop in framerate because it bottlenecks, and that's without virtualization or low level access functions, just in current games. An Asus B75 chipset board for instance will perform up to 30% better in game framerate than an Asus Z77/Z87 board."

So in combination of this snipped from you I'm quoting from another thread, you're basically saying that I should be staying away from Asus Z motherboards in general?

Not all of them, but certainly the more expensive ones. The Intel Z-chipset is not such a good deal because it doesn't support IOMMU, but in general, gaming framerate benchmarks (also in Windows) show 10-30% more performance on B/H-chipset motherboards from Asus than on Z-chipset motherboards from Asus, just because of the PLX-PCIe controller Asus uses on it's "premium" motherboards.

Okay, I've been reading up a bit on this, so scratch my previous questions.

But I really wanna know how you know that the Asus PCIe controllers have some kind of problem that reduces performance and prevents Mantle. Don't you think Asus would have noticed something as major as a 10-30% difference? How could that even happen?

The Nvidia running to linux to stay competitive is basicly one big popcorn moment for me;

I can't wait to get to the funny moments.

It's a well-documented thing that H87 and B85 chipsets provide faster OpenGL/graphics performance in comparison to Z87 mobos running the same CPU and GPU, but Z87 chipset mobos deliver more CPU performance. In games, CPU performance is seldom important, but GPU performance is. On top of that, Asus has made a marketing decision to use it's PLX chips for XFire/SLI support, the 1150-socket mobos not having multiple high speed PCIe lanes like a 2011-socket mobo with the X-chipsets. That PLX chip reduces graphics performance even further. So yeah, if it's for gaming, you'll get better performance out of a B85/H87 chipset motherboard than out of a Z87 motherboard.

As far as Mantle is concerned, it is a lower level API, which means that some of the functions that can be controlled by the API, might include power management through direct access to the video card, and direct system memory access by the video card. These functions will only work if the system allows the video card to access the system memory (where the application lives) directly, which requires address translation between the GPU resources and the system resources. Asus PCIe controllers are documented to block this table translation, making direct access impossible. The new acceleration technology developed by RedHat for nVidia cards requires this direct hardware access, whether or not Mantle requires it is still unclear, but it makes sense.

Do these reductions in speed affect multi-GPU configurations only, or is it across the board?

Could you also link to a few sources? Because I'd rather not trust something on face-value alone.

Agreed, how cool of a show is this going to be...

Lolz, don't you have access to ddg.gg? There are plenty of benchmarks all over the place that confirm this.

And it's not only true for Asus boards, even though these generally score worse because of the PCIe controllers, but it's true for any Z87 board, for games, where graphics performance is important and CPU performance is not, H87 and B85 boards score 10-30% faster than Z87 boards, in linux and in windows, with all kinds of dedicated graphics cards.

This is one example of a direct comparison between an Intel (reference, so very vanilla, not optimized for gaming by any stretch of the imagination) H87 board and an ECS Extreme series (gaming optimized) Z87 board, using non-commercial open source benchmarks (just because you're such a skeptic):

http://openbenchmarking.org/result/1308151-SO-ECSZ87MOB87

You'll find the same with any other brand of boards. The Z87 chipset is marketed as the preferred chipset for "gaming" boards, but the "business" boards perform a lot better in games. Not in CPU performance, so for rendering and stuff the Z87 will perform better than the H87, but in games, spending more money on a Z87 board is the same as downgrading you GPU a full generation or two tiers within the same generation, so it's a complete waste of money for gaming systems.

Also, the newer technologies (links are available in other threads on this forum) will use hardware virtualization to speed up graphics performance. This is a confirmed fact as far as the new RedHat developed technology for nVidia cards is concerned, and a high probability as far as AMD Mantle is concerned, although nothing has been confirmed on that end yet. Very few Z87 boards actually work for hardware virtualization. Is it a strategy of Intel to promote their own graphics solutions? Is it just a big joke and Intel devs are laughing their asses off at the expense of the "gaming" consumers? Who knows, either way, it's obviously not something that sponsored sites are advertising, but it's a fact that has been confirmed over and over again using objective benchmarks.

EDIT: and looking at the benchmarks linked above, you'll also notice that framerate performance actually drops with Haswell overclocking as opposed to running the CPU at stock clock. This is partially solved in linux with kernel 3.12, but it will probably never be solved in windows. You'll also notice that if you compare stock clock CPU performance on H87 and Z87 boards, that there isn't that much of a performance difference, most of the extra CPU performance on Z87 comes from the overclock. As far as graphics performance is concerned, sometimes the framerates on the vanilla H87 board are almost twice as high as on the Z87 board, so even with the linux kernel 3.12, which offers a boost up to maximum 40% in those extreme cases, that only makes up for half the performance loss. So for gamers: buy cheaper, benefit more.

Dude.. The Graphics card used in the tests you linked are "_________" and "Intel Haswell Desktop 1250MHz". They're testing unknown graphics vs integrated graphics, they might not even be using the PCIe slots...

Dude... then click yourself on a benchmark on that same site or another site with nVidia or AMD graphics, I'm not a wireless mouse or a google remote control... or check the anandtech forum or any other major forum about this, there is so much documentation of this, I can't believe you've actually missed any. If you want to know, do some work yourself, or read marketing slogans on boxes, see if I care...

That's the thing. I have been looking and asking around.

Nobody on techpowerup knew anything about such things. I can't find a single review that compares the GPU performance of several different motherboards (only CPU performance). I've also been googling for problems with Asus' PCIe controller.

I've come up with nothing.

So no offense, but I'm starting to doubt your statements. Please prove my doubts wrong though, if you can.

OK, I'll make it easy for you, just search for some comparative benchmarks between Z87 based boards and the H87-based AsRock Fatality board... and not just those that only test on Metro Last Light or some obscure game with non-typical results, compare the "normal" game benchmarks. You could also take a look at the many comparative benchmarks published this week documenting the impact of kernel 3.12, plenty of that around. You could also use ddg.gg, with or without !g, instead of google to get more relevant results from google, just don't expect me to type it in for you, I've been more than courteous in this matter by continuously replying and ignoring comments in other threads from your part referring to me as "that guy from the nvidia thread" instead of showing equal courtesy...

Whether or not you doubt my statements is something I don't care about, at all, you do what you have to do, mate, doubt my statements if that makes you happy... as I said, see if I care...

I'm sorry if I've done something to offend you. That has never been my intention.

Now, I'm clearly doing something wrong here when I'm searching because the only comparative benchmark I've found is one where the gaming performance is the exact same on both chipsets. You clearly know of some benchmarks that show these things, so please just take 5 minutes of your time to link them.

Are these problems specific to Linux? Just asking since you're bringing up the Kernel. I thought they were hardware limitations/problems.