Video rendering via CUDA VS CPU

When I first tried video converting using CUDA via Badaboom on Geforce GT 9600(I know it was long time ago) I was amazed by the speed.

Somehow I missed some big discussion online(I feel really, really upset that no one notified me. Not fare, really not cool. :( ;) .) Recently I noticed some Youtubers do not use it, and they prefer to use CPU, so they are buying multiple Xeon processors.

Even with multiple Xeon Processors there rendering frames are lower than what I was getting on my GTX 480 (I think. I am not 100% sure on this.)

So my question would be: What is the disadvantages of using CUDA instead of CPU for Video rendering?

There is a disadvantage? If they are doing 4K, and don't have modern GPUs.

I haven't been paying attention either, but with CUDA rendering or OpenCL with new versions of Premiere. If you are running old GPUs I could see CUDA being worse than CPU rendering.

My only experience with it is in Blender using the Cycles Render engine and my GTX 970 is about 10-15% faster than my i7 4790k @4.7ghz. I have wondered how it would fare on other softwares since Blender doesn't strike me as being a well-optimized Editor.

i used to use all of my youtube trans-codes on xilisoft with a gtx 670
then nvidia magically decided to remove cuda encoding from their drivers
so the community hacked it back in.
then i got a new gpu a 970
they removed cuda and replaced it with hvec
i still haven't gotten a program like xilisoft converter that will acutally use gpu acceleration anymore

with my friends at yifitorrents (
they CLAIM that the cuda although faster, was not quite as accurate to render a transcode / compression
but for my use of random youtube crap and cutting raw files size to h.264 and whatnot it was perfect
i wrote them a letter, and they said essentially. you want cuda? use the old ass driver you sap we already have your money.

but that was before i got the new card.

So, as I understood. Simple Nvidia removed support for CUDA Encoding.

What is going in with OpenCL which GPU's support it, and what software supports it for Video encoding?

Premiere, from my understanding supports OpenCL, I haven't tried it though.

I'm curious as to what he's doing.
What his output format was? he said 1080p but that tells me nothing.
was it Quicktime? H.264? Mpeg4?

The answer is that there isn't much benefit at all(to my knowledge), unless you're willing to hack your card or invest in a "professional" card. NVidia has been gimping their gaming cards for ages when it comes to productivity. It's not that the card can't perform, it's that NVidia won't let it. People were hacking 600 series cards to be recognized as Quadro or Tesla GPUs,so that they could run productivity stuff like a "professional" card. IE a 660ti could be modded to be recognized as a K5000 to gain the Quadro features, only with the slightly gimped core/memory specs of the 660ti. I don't know if AMD has similar practices.

That being said, the higher cost for professional cards is what funds the R&D that makes the enthusiast cards possible, because their assumption is that people who need that processing power can afford to pay for it.

The benefit is all dependent on the settings you use when exporting your videos.

Using Premiere Pro CC: I just tried to export a 4K video at a CBR of 50Mbps and the CPU was faster by a few seconds than GPU accelerated, when I cut the Bitrate in half down to a CBR of 25Mbps and left the Video at 4K similar experience.
When I cut the resolution from 4K to 1080p, with a CBR of 50Mbps Again similar results.
However when I cut the resolution down to 1080p & the bitrate to 25Mbps, the GPU acceleration was almost 85% faster.

So, the question then comes up, which piece of the puzzle is choking it up?

I have a 7900 series AMD GPU from Gigabyte, and an Intel Xeon1230 v3. 16GB 1333 RAM.

I'll have 32GB of 1600 RAM next week, I'll see if that makes any difference.

I would guess that it is a problem on the software side of things. Maybe the engine isn't able to handle those settings effectively. I saw similarly strange results when looking into opencl enabled performance enhancements in adobe (don't remember the exact software). Even with opencl enabled, an i5 outperformed the a10-7850k. That is absurd. I am guessing that the implementation isn't completely up to par. That said, maybe you are low on ram for anything over 1080p at 25mbps. What kind of ram utilization were you getting? Maxed throughout? If so, then adding more ram would make things interesting as it would alleviate a potential bottleneck and would help us see if it really is the software that is lacking here. My curiosity really appreciates you taking the time to run these tests.

im currious as well.
if you where hitting your memory limmits, then adding more memory would of course make sense.
However i dont expect that the memory speed will make a big diffrence.

Still Cuda vs Open CL vs CPU render is still a very interessting test.
In my opinnion, this would be an interessting item for a Video.
Because there is not manny direct information to be found about this subject.

I was curious, I'm beginning to think it's the CPU and not the GPU. Consider, In a nutshell; The CPU by itself takes a frame, render's it, and spits it out. When using the GPU, the CPU takes that same frame, hands it to the GPU and then receives a rendered frame back from the GPU, then hands it off to the disk. (if I understand the whole process correctly)

If the CPU can handle the process for a single frame in the same amount of time it takes for it to hand it off to the GPU, the advantage of the GPU rendering the frame is lost on the fact that the CPU cannot give it frames fast enough.

I need more test benches.

I think you might be right.
It looks like that your cpu is still the limited factor in this whole process.
Thats why i personaly think, that for decent video rendering, it still makes more sense to have a decent cpu, even if you wanne use a cpu + gpu together for rendering.

If it were the way that you described, then a larger frame rate should result in an advantage for the gpu use since, theoretically, it would be harder for the cpu to render the frame by itself in the same time that it would take to hand it off and sort what it is given back. However your tests showed the opposite. Maybe it would be different if you were using a "workstation" gpu. You wouldn't happen to have any of those lying around, would you?

Unfortunately not.

Remember, a 25Mbps 1080p worked far better than a 25Mbps 4K, and I saw a mild increase in CPU usage in both software rendering, and GPU accelerated.

I'll see what I can do about getting a hold of a Workstation GPU.

If I am understanding this correctly, we are saying that the cpu does much the same task in these types of applications as it does in-game. That is to say that it delegates work to the gpu. The differences being (presumably) that the work necessary for any part, gpu or cpu, to render one of these frames is higher here than in-game, and in-game, the cpu is tasked with doing other things as well, ie calculating physics for whatever is going on, etc. With all of those assumptions, it should be that the gpu is more effectively utilized here than in-game because of the presumably decrease workload on the cpu, freeing it up to delegate to the gpu more effectively. Meanwhile, if the work necessary to render is actually higher here, then we should see more of an improvement going from cpu to gpu here than we see going from cpu to gpu in-game. No one will argue that the cpu can effectively do the gpu's job in-game.

All of this rambling leads me back to the original, confused state in which I found myself. That is to say that gpu acceleration here SHOULD (if I am not misunderstanding things here) be a massive improvement in rendering videos than using the cpu. But it isn't. This makes me assume that the opencl implementation here is subpar.

EDIT: I also made the assumption here that the gpu has more horsepower for this kind of situation than the cpu does, which I think is a fairly safe assumption given the difference in raw compute between the two as well as the differences in nature between them (generic vs specialized).

The difference between Game rendering and Video-Edit rendering, I would say, on the surface anyway, is I/O.

With a simple video render, you take input from a file on disk, pass it through the rendering engine, then write the new file to disk.

I found that the GPU rendering was nearly 45% faster in a 25Mbps/1080p output. And only 2% faster with a 25Mbps 4K render.

So the questions to be answered, are:
"What is it about high resolution (4k) video that causes this to choke?"
"At what bitrate with 4k does performance improve?"
"At what bitrate with 1080p does performance improvements diminish?"


I am also interested in know why. What is it about the implementation that causes the differences between gpu acceleration in various bitrates at various resolutions? Why does the resolution matter more than the bitrate? That doesn't seem to make too much sense.

1 Like

If I were a cartoon character, I would have had a billion "?" over my head when I found this out last night.