Continuing to deep dive 2990wx perf regressions

Attn programmers: Handbrake is what I would like to investigate.from using bitd to dump the process, it appears as though it sets it’s own affinity masks for it’s own threads.

Dump from handbrake transcoding big buck bunny at 1080p to hevc 4k Amazon preset… 185291_handbrake2.txt (809.1 KB)

Handbrake is kinda open source but looks like it loads modules to do stuff like x265

It’s explicitly bad past 8cores, at least they say that in the docs.

In epyc it’s nearly the same performance as a 2950x and a 2990wx.

It looks to me like from the affinity dump that something in handbrake is explicitly setting CPU affinity.

My questions is… Is it? Could be a library they build in too? It’s creating like 600 threads. Why?

If you are a dev and have time to help disassemble I could use the help. Is this fixable?

I tested on WSL and it didn’t seem much better. I still have to test Linux bare metal.

The handbrake cli averaged 8fps to complete the transaction.

If you enable extended logging, you’ll see heaps of threads created for various tasks.
While, in the windows version 1.0.7 I have, the frontend is a .net program, most of the heavy lifting is done in the c++ hb.dll.
It imports SetThreadAffinityMask and SetProcessAffinityMask from kernel32 but as far as I can tell, doesn’t call them. You’d have to check the source to be sure, I was looking at it in IDA.
It calls CreateThread a bunch in the hb_thread_init function, but I don’t believe you can specify affinity there.

Are you manually setting thread affinity after starting an encoding task? That is when the bulk of the threads are created, and I assume they are copying the process’ affinity or just using default mask? I am not in any way proficient at any of this stuff.

unless there was a lurking process lasso process I forgot about or something, the initial dump shows really weird (and changing) affinity for those processes.

I’ll do a few more test runs.

1 Like

This is more to do with how x264 and x265 massively multithreads and how it utilizes each thread and how it divides up the work.

FFmpeg itself can properly use a ton of threads if you use something like the DNxHD/DNxHR encoder, but x264 and x265 are poorly optimized the more threads you throw at it. This is what I alluded to in the other thread.

Linux x264 and x265 will be no different. This is not an OS issue, but a x264/x265 issue. It also isn’t a FFmpeg issue cause if you capture 4K 60fps from your Magewell using FFmpeg, (Windows or Linux) recording straight to DNxHR, it can properly multithread.

1 Like