Opinion on AMD's OpenCL Drivers

Hey there, this is sort of an unknown for some unknown(hah!) reason. AMD's drivers for OpenCL are not actually very good.

Specifically I'm referring to how they handle large kernels in terms of compilation and execution speed. I'm a Blender user (3d modelling tool) and it has a GPU accelerated (100% uses the GPU if put in GPU mode, not hybrid like some other GPU accelerated renderers) using CUDA or OpenCL.

Here's the interesting thing: The cycles kernel compiles quickly and executes quickly on Intel OpenCL and Nvidia OpenCL which indicates the Cycles kernel is written properly. But as soon as you try to do it with a Radeon graphics card, things turn ugly.

Here's a link to a thread on the official AMD dev forums about the issue: http://devgurus.amd.com/thread/160162

But to make things easier on you, the short of it is an Nvidia GTX 570 renders a certain test scene (http://mikepan.com/files/BMW1M-MikePan.blend) in 53 seconds in both CUDA and OpenCL mode. While a typical 7970 renders the same scene in 4 minutes and 12 seconds. Also it looks wrong:

So my question is: What's your opinion on this situation? This problem has been around for years and AMD was only alerted to it in December of 2012. My 7970 is essentially a brick when it comes to anything OpenCL that doesn't involve mining bitcoin.

looks like it affects luxrender and Blender Cycles at the compiler level. the support person is trying to help, but the 2 major posters in the thread don't speak english, and instead use straight google translate. I can't even make much sense of most of the posts...

the support team needs to understand the problem as well as possible so they can pass as much information to the engineers as possible. I may try out blender cycles to see if I can't replicate the problem on my 7970...

Interesting stuff Iron... I just posted a similar subject about this on their inbox.

https://teksyndicate.com/forum/inboxexe/ts-team-gpu-render-tests/159243

Let's hope they can look into it. Cause with the new R9 290/290X series it would be a shame if the Radeon cards were such a mess in these type of applications.

Thanks for letting other people know how to test gpu compute on amd cards. I made a batch file that gets me most of the way (doesn't actually launch blender). Keep in mind it will only work at all with drivers from some weird thing and up. Safe to say that 13.10 beta and up work(as in, don't crash while compiling).

I've posted in that thread! I was kinda disappointed when I realized AMD card didn't work well with Blender's Cycles.

Here is how you enable AMD OpenCL:

CMD command lines.

 I've just ran the scene through each setting on my PC:

Benchmarks!

I guess I should mention, I am rendering the common BMW scene. If you couldn't infer, the 7970 was running with OpenCL. I might have seen some artifacts in the 7970+8350 render, but I cannot be for sure. Obviously for me, OpenCL seems to run the fastest.

From what I hear, the 6xx series have had their gpgpu computer severely crippled. So you see 580s beating 680s and 780s in some tests.

I have had no problems with openCL in Vegas, so it seems to be a problem with blender and opencl. in fact, it could be a lack of GCN optimization with blender's code.

My guess is vegas is using a hybrid solution with a tiny kernel. Meaning sony vegas developers worked around the issues rather than were unaffected by them. Which while works, is not exactly ideal as that ultimately holds back the fixing of the real problems.

Im assuming this is what you are talking about. It really does suck seeing all the compute power amd gpus have.

http://wiki.blender.org/index.php/Dev:2.6/Source/Render/Cycles/OpenCL

This is an issue that should have been addressed a long time ago, but AMD has been focusing on a much larger framework, and it's not an easy thing to solve for several reasons:

http://developer.amd.com/developer-preview/

For a quick fix, a path tracer like SLG should work without problems on whatever driver and operating system. I would suggest trying that.

AMD is moving towards HCA altogether, that is their main focus. Macpros now have AMD too, so there will undoubtedly be a solution for this soon.

But it's just not fair to criticize AMD for this: this is just a minor thing in the big picture of the technology shift AMD is pioneering. Part of the problem is that projects like Blender, but also projects like Darktable, are developed on nVidia hardware, and don't really care about AMD GPUs, and when they started, they thought that CUDA would be the best thing ever, and focused on that. That is just specific to very small open source projects, Blender has made design choices that do not take all hardware into account, that is unfortunate and not the way it should be in open source, but it's impossible to blame them for it, they have made something fantastic for such a small team with so limited resources, and those resources included having nVidia CUDA solutions. However, it's not up to AMD to adapt to small open source projects that are developed mainly for nVidia proprietary systems, and because it's open source, it's technically up to Blender to primarily assure compatibility with open source APIs instead of proprietary ones, and unfortunately, Blender chose an OpenCL implementation that was "ported" from CUDA, instead of implementing OpenCL in a way that would work on all GPU manufacturers' implementations. AMD hasn't been changing anything specific lately that has broken the compatibility, and they have a lot more going on than just one project. Such things are unfortunate, but not unsolvable (SLG for instance), and show that even in open source projects, developers can be blinded, and ignore open source APIs in favour of proprietary ones, thereby knowing full well that their project would never work on open source graphics drivers. To put the blame on AMD for not being compatible with Blender's CUDA-like OpenCL implementation, is out of line in my opinion. The solution here would be for Blender to rewrite, as there is no "bug" or "fault" in AMD's implementation of OpenCL standards, Blender just uses massive chunks of data because it was a convenient solution that was easier and faster to implement, and it worked on their CUDA cards with proprietary drivers, and they didn't take everything else into account. That happens, Darktable does the same, and they feel compelled to criticize AMD too for their own lack of objective open source API implementation. Nothing anyone can do about it, shit happens. Blender is open source, anyone can change the code to make it work with AMD's implementation of an open source standard API...

So I strongly object to the OP statements that "the OpenCL drivers of AMD are not very good". They implement the OpenCL API according to the OpenCL standards. Blender just didn't optimize for an objective OpenCL implementation.

Hi Zoltan, the problem with your wall of text is the fact that Cycles OpenCL works completely fine on non Nvidia hardware, such as Intels CPUs(I'm referring to OpenCL not X86). It may very well be the case that somehow Cycles is using Non standard Cycles but considering my previous statement, I find non standard compliance by the Cycles team hard to believe.

 

Well it works on AMD CPUs also, it doesn't work on Intel GPUs, as Intel iGPUs don't even have OpenCL support. So the only GPU acceleration it works with is nVidia. Why? Just because Cycles uses giant monolithical chunks of code to push through to the GPU, using the same calls for OpenCL that were used for CUDA, in other words, Cycles is not very optimized for OpenCL, because nVidia isn't performing well in OpenCL, so Cycles implements the least interesting OpenCL GPU acceleration, but not the much stronger OpenCL technologies? OK... Second thing... show me where AMD doesn't implement the OpenCL API perfectly according to the specs... well, you can't, because AMD has implemented everything just right, and they have toolkits available for free that enables application devs to easily implement the full spectrum of OpenCL acceleration. The Cycles devs only use a small and very particular part of OpenCL calls that happen to work on nVidia cards because they are just a translation of the CUDA calls.

The only long term solution for this that would really benefit GPU acceleration in Cycles, is if Cycles recodes their system to split up the instruction codes into smaller chunks that are optimized for OpenCL. They're going to have to do it anyway to keep up with evolving technology, because nVidia is going to implement OpenCL to the full extent like AMD anyway. Proof of this is that they've contributed code to the OpenMP/OpenACC projects, thereby following AMD and Samsung in that direction, in order to upgrade their OpenCL capabilities, which are, quite frankly, hugely sub-par for the moment, which is holding them back from adopting GP-GPU scaling.

Cycles is a small open source project, it may well be that they don't have the energy to rewrite all of their stuff, and then another renderer will become more popular, just like it always goes in open source: the best solution wins. And to be very honest, I think that's exactly what's going to happen, Mac Pros have AMD solutions because of the much better OpenCL performance now, and they are used a lot in the creative sector, Apple is probably just going to issue a recommendation to use another renderer, possibly help development of a more suitable solution. It's going to be sorted out fast, but not by AMD, because AMD has done nothing wrong, I don't want to criticize the Cycles devs, they have made a choice based on a driver situation from the past, and always wanted to support the commercial closed source platforms, which is part of why they have become so popular, but they have to recognize that some of the choices they've made don't stand the test of time, nor the test of open source programming standards, and they either adapt, or become less relevant.

So the real problem with your earlier wall of text was that the premiss was wrong: the AMD drivers are in fact better for OpenCL than nVidia drivers, as they provide better OpenCL support, the real problem is the implementation of Cycles, which is not optimized for full OpenCL acceleration. Instead of claiming that the AMD OpenCL drivers aren't any good, which is just wrong because they are by far the best OpenCL implementation of any GP-GPU solution out there for the moment, you could have posted that Cycles doesn't do well with OpenCL, which would be a more correct statement, and ask the forum if they knew of a renderer that would implement OpenCL acceleration technology better...

If Zoltan hasn't convinced you,

Crytek's R&D technical director, Michael Glueck, said 'yes, that [Mantle] would appeal to us,' when he heard about Huddy's claims. However, he also seemed surprised, pointing out that 'AMD is the biggest force driving the OpenCL standard; it even released drivers to run OpenCL on SSE2 and that's kind of a way to remove low-level hardware access for simple compute tasks.'

link, I think Glueck was surprised that AMD would push OpenCL and then release Mantle, a non-direct competitor. But that doesn't change the fact that AMD is the biggest force driving OpenCL.

hell, vray is available for use with blender already, albeit as an exporter to use with standalone vray, but still. why use a horribly optimized kludge like cycles, when you could use vray?