If you pass through the GPU card, it's not available to the host system.
OpenCL doesn't benefit games. Windows is too slow for heavy compute applications... I don't see where you're going with your question, I'm sorry.
HSA is a thing, a university research project for with I have a server running, has bought small AMD APU systems with a couple of AMD GP-GPUs, that get really good OpenCL performance in linux, and that they can use on the go for simulations and advanced computational models. They have two flying brigades that consist of two students with such a machine to go out in the field and analyse and model stuff very quickly, and the systems are small and cheap, and save a lot of mainframe time, and on project time, because data can be processed almost in real time in the field, which has never been possible before. But outside of that type of applications, there isn't that much benefit in HSA in the consumer realm, with the exception of small accelerations in applications like Darktable or LibreOffice. HSA has a long way to go before consumers can benefit from it. One of the biggest problems is that more than half of the consumers and enterprise users, don't have hardware that can even run HSA optimized applications, for any number of reasons.
IOMMU is basically 2 things:
- address translation so that parts of the system have direct access to the system memory, avoiding excursions through the CPU for loads the CPU can do nothing about but rerout them, which costs valuable clock cycles;
- instruction translation so that parts of the system can interpret and autonomously execute instructions that at that point don't have to be executed by the CPU any more.
The link with HSA is that GP-GPUs can be "seen" by an HSA optimized linux system as autonomous compute devices that receive direct instructions, whereby the CPU is taxed only the very minimum, basically just to process the application that starts the direct instructions to the autonomous compute centers. Linux - by design, just like any UNIX-like system - is ideal for that, for a number of reasons, but still has to be optimized for this stuff. It's the same thing that HPC many core computers like Watson etc use, but whereby the instruction node is the CPU and the compute nodes are not peers, but GP-GPUs... or autonomous processing devices like the Phi (the Phi, theoretically, can run linux all by itself, at least, that's what Intel tries to accomplish, and that's the next step, if it ever comes, whereby instruction translation is no longer necessary, but the compute card can handle the same instructions as the CPU, that's true many core computing, with one memory pool, and in that constellation, the Phi would be a peer to the CPU, but with another focus, the Phi has a lot of compute cores, but also application processor cores, a system with a 6- or 8-core Intel CPU with built-in iGPU and a number of Phis would be like a system with a single many-thousands-cores CPU and a scalable iGP-GPU). But Intel isn't quite done with it yet, they don't have the whole thing working yet. AMD has HSA and the cooperation through the HSA Foundation (which is now also a member of the Linux Foundation) of Ti, Samsung, Qualcomm, etc... so that'll be interesting soon. The Intel approach is more ambitious, but the hardware is also much more expensive (a Phi costs about 6 times as much as a compyte performance equivalent AMD FirePro), and as many applications have shown in the past, there isn't that much of a performance loss in linux when a hardware abstraction layer or instruction translation layer is added, however, and, GP-GPU memory is generally faster than system RAM, and it'll take a while before that catches up, so at least for the next couple of years, I think the CPU+GP-GPU hybrid solution will be more efficient than the orthodox many-core solutions. The limiting factor right now is system bandwidth, and GP-GPU's, with their high-bandwidth local memory (AMD has always prioritized on bus width, it seriously pays off, even though it's quite an engineering feat to make cards like they do, the power requirements for full 512-bit high speed memory bandwidth are huge, the top AMD cards have near stupid power requirements), can store a lot of compute workload locally. AMD cards are conceived like CPU's, which fits the concept, they go through a workload in a serial fashion, just like CPUs do. nVidia doesn't follow that path, they focus on parallel execution (which is why they can't support many standard compute features, but have their own set of instructions that are not that much used by compute applications. nVidia also has smaller memory buses on their cards, and use faster clocking memory chips, which is a bad thing for compute applications, because it dramatically increases the fault tolerance, it's great for pushing pixels to a monitor as fast as possible, but for compute, it's severally counterproductive. To bridge the gap, RedHat engineers have devised a system whereby the IOMMU functionality is used to translate the standard compute loads into adapted workloads that can be handled by nVidia cards. This makes up for some of the compute performance handicap of nVidia cards, but it can't solve it, and because Intel wasn't born yesterday, Intel makes sure that a lot of CPU and/or chipset products don't support IOMMU, so that nVidia is kept out of the HSA race.
For Intel, this is a win-win situation: they have an agreement with AMD, nobody but Intel and AMD can make x86-products, and AMD can't make third party solutions like Intel. So even if AMD has the practical edge now with their technology that is based on the IP from ATI, something Intel doesn't have, Intel has now time to develop further until Intel and AMD see the time fit to open up the market to HSA. A big factor in that is the Intel-Microsoft alliance. Obviously, Microsoft is Intel's ball and chain, but earlier attempts of Intel to develop non-Windows products, have failed because of Microsoft boycotts (e.g. the Microsoft/Asus "Runs better on Windows" deal that killed the Intel Atom CPUs for netbooks), so Intel had to make sure the Microsoft was the one to blow up the alliance, and that's exactly what Microsoft did by becoming the "XBone" company. So Intel now has a clear lane to move forward in the direction that AMD has been moving in for about two years, and they have some catching up to do, but they are frantically working on it. AMD has made sure to cooperate with Intel on this, they know that they have nothing to fear from nVidia, which is losing ever more in the ARM-space, whereas AMD is winning in the embedded space, and nVidia is completely linked to the Windows realm, doesn't have any realistic HSA technology, and doesn't even seem able to show off a working GK118-product, whereas AMD and Intel are moving open source/linux, and Intel is making sure that the nVidia compute instructions, an nVidia version of OpenACC, is not coming through. It's clear that Intel and AMD have found a balance in their competition, Intel has the better technology in terms of litho, instruction optimization, and IPC, AMD has more tools for HSA, has working hardware, has very flexible management, and a focus on price/value products that just work. Intel can make third party solutions, AMD can't, and that's OK for AMD, they let third party engineers squeeze out the extra performance that costs them nothing, at the expense of less SKUs in the marketplace, which means they have greater manufacturing flexibility, which means less overhead, which means better value products. Intel has a lot of SKUs, a US shareholder body that requires more dividends and thus much higher profit margins, and a less flexible manufacturing process because they have to do it themselves and have to produce many more products, but for that, they also spend more on R&D and production lines, and can offer more IPC performance and smaller litho designs, which cuts the raw material cost down hugely. So AMD and Intel pretty much stay out of each other's wake, and everybody's happy.
The actual weak point is x86. With HSA also comes competition from ARM. Because when the traditional CISC-CPU becomes less important for the overall system performance, ARM starts to make a chance to break the x86-monopoly. That will be interesting. AMD has a foot in the door there, because they control the HSA technology through the HSA foundation, and they have the tools, and the ASIC/embedded business to actually make big bucks on ARM manufacturers that want to venture into many core hardware. Intel doesn't have anything in the ARM-world, but they have very small litho x86 hardware with the new Atom-series, that everybody however seems to want to stay away from, forcing Intel to make a new deal with Microsoft on Atom, which is ironic and amusing, but hey, sins of the past... A determining factor will be what Google wants to do. They have all the choice in the world, they have Motorola and might be going after Acer. Intel sits with Asus, which is starting to dismantle it's production. A lot is moving, the pillars that have been carrying the weight of the industry are crumbling, and 2014-2015 will be a time of great changes. Microsoft itself is bound to the deal with Novell that keeps them from entering the linux market in any significant fashion until 2016, unless they blow up that deal, which is liable to cost them dearly, as they would also blow up their license claims on the FAT-filesystem, which happens to be the single most expensive part of any Android device, so they have very difficult choices ahead of them, and they are standing with their back against the wall. The big winner could be Samsung, that has an alliance with both Intel (Tizen) and AMD (HSA), and has been undermining Google for quite some time now, offering SELinux on their android phones and allowing users to circumvent Google Apps by using Samsung Apps, without rooting phones or taking crap, and that is winning them a lot of corporate users, because it just works, users can use Google services without the Google crap by using Samsung equivalents and open source applications, can have corporate management tools for mobile devices, can have the added security of SELinux, and most important of all, by disabling Google Apps on Samsung phones and using Samsung apps for access to Google Services with feature bonuses (online phone management via Samsung servers, without Google crap also, etc...), the battery life of Samsung phones is at least two days with intensive use, which is a great plus for corporate customers. Samsung is gradually eating away Google's base, and Google can't do anything about it. Samsung has the fabs, the technology, and the alliances, to take over a lot of business and offer the customers valuable benefits going forward. When Samsung succeeds, Intel and even AMD will be eating out of Samsung's hand, and all Samsung will have to do to finish off Google is to set up Microsoft some more to bring the fight to Google.