FFMpeg AV1 Encoding Using Intel Arc GPU tips?

programster · December 29, 2023, 4:14pm

Hey there,
I recently got an Intel arc a380 graphics card for AV1 encoding and I’ve had some success getting it running on Xubuntu 22.04 (altough I’m not sure which combination of steps did the trick, and I ended up having to re-install my desktop packages and nvidia drivers in the process).

I can successfully encode a video using my arc GPU by running:

ffmpeg \
  -i input.mp4 \
  -init_hw_device \
  vaapi=va:/dev/dri/renderD128 \
  -c:v av1_qsv \
  -preset veryslow \
  -look_ahead_depth 99 \
  -b:v 1M \
  -low_power 0 \
  output.av1.1000kbps.veryslow.mp4

Even at the veryslow preset, it whips through a video at lightning speed (and quietly) compared to using a CPU-based multi-pass encode with handbrake. However, it is very obvious that the outputted result is nowhere near as good a quality/filesize compression as a CPU based encode, which takes waaaay longer.

I was wondering if anybody knew if it was possible, and if so provide the command, to run a multipass encode using the arc GPU?

In case it comes up, the GPU is in the secondary PCI x 16 slot, with the primary allocated to my nvidia GPU that the displays run through. These are paired with an AMD Ryzen 5 7600.

sirn · December 29, 2023, 5:11pm

I don’t use multipass, so I’ll leave that to someone else, but here’s what I used to optimize AV1 encoding quality on my A380 (on Jellyfin) according to Intel’s quality optimization guide:

github.com

intel/media-delivery/blob/master/doc/quality.rst#av1

Video Quality
=============

.. contents::

.. |SMT| replace:: Sample Multi-Transcode
.. _SMT: https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/samples/readme-multi-transcode_linux.md

.. _ffmpeg-qsv: https://trac.ffmpeg.org/wiki/Hardware/QuickSync

.. |measure-quality| replace:: video quality measuring tool
.. _measure-quality: man/measure-quality.asciidoc

.. |measure-perf| replace:: performance measuring tool
.. _measure-perf: man/measure-perf.asciidoc

.. |na| raw:: html

   &#x2205;

This file has been truncated. show original

-look_ahead_depth 99 \

-look_ahead_depth (LAD) requires -extbrc 1, and recommended value for quality is 40. It also requires -extra_hw_frames to be set to the same value:

-extbrc 1 \
-look_ahead_depth 40 \
-extra_hw_frames 40 \

  -b:v 1M \

For CBR, Intel also recommends to set -bufsize to at least 2 times the bitrate to maximize quality. In this case, 2M should be used. Also -rc_init_occupancy should also be set for initial buffer delay. The recommended value is half of bitrate:

-b:v 1M \
-bufsize 2M \
-rc_init_occupancy 512K \

Other optimizations:

-adaptive_i 1 for adaptive scene change detection
-adaptive_b 1 for adaptive miniGOP
-b_strategy 1 -bf 7 activate full 3 level B-Pyramid. Note that -bf should be set to less value than LAD. (If LAD is 40, then valid value is up to 39.)

So final command (untested):

ffmpeg \
  -i input.mp4 \
  -init_hw_device vaapi=va:/dev/dri/renderD128 \
  -c:v av1_qsv \
  -preset veryslow \
  -extbrc 1 \
  -look_ahead_depth 40 \
  -extra_hw_frames 40 \
  -b:v 1M \
  -bufsize 2M \
  -rc_init_occupancy 512K \
  -low_power 0 \
  -adaptive_i 1 \
  -adaptive_b 1 \
  -b_strategy 1 -bf 7 \
  output.mp4

Typically I run VBR, but this settings halved the encoding speed while giving a much better quality. SVT-AV1 still produces a better result at a smaller filesize however (but it’s wayyyy too slow compared to QSV).

Forcing ExtBRC to use the new EncTools by setting -extbrc 1 -look_ahead_depth 40 (or any value above 1) seems to be what helps with quality on QSV AV1.

programster · December 29, 2023, 5:49pm

Thanks for that.

I tried running the command as you posted it, but got the following error:

Thus, I ran it again without the -extra_hw_frames 40 \ and it worked. I’ve been running a variety of bitrates, and I swear this has “fixed” the quality of the 1000kbps version so that I don’t have to run at 2000kbps to get the same level of perceptual quality which is fantastic (a major improvement). The command in its full form that is currently working for me is:

ffmpeg \
  -i input.mp4 \
  -init_hw_device vaapi=va:/dev/dri/renderD128 \
  -c:v av1_qsv \
  -preset veryslow \
  -extbrc 1 \
  -look_ahead_depth 40 \
  -b:v 1M \
  -bufsize 2M \
  -rc_init_occupancy 512K \
  -low_power 0 \
  -adaptive_i 1 \
  -adaptive_b 1 \
  -b_strategy 1 -bf 7 \
  output.mp4

sirn · December 29, 2023, 9:13pm

Can you try putting -extra_hw_frames before -i? I remembered the order of arguments matter quite a bit for this one.

The_Riddick · January 16, 2024, 10:17pm

Was thinking of getting a Intel ARC GPU for Linux OBS encoding but it seems its STILL not there yet. Only the Windows drivers function atm?!?

Setting up a windows server for OBS streaming is depressing! But it seems maybe only option atm for ARC until Linux drivers catch up (could take a while)

wallacebw · March 27, 2025, 11:29pm

I’m using an Arc B580 on linux (6.13.x branch) using the Xe driver for jellyfin transcodes and it works fine.

Not an OBS user, but the underlying GPU encode / decode is working (i think it was 6.5+? for A series and 6.12+ for B series). Here are a couple resources that may get you going down the right path:

https://wiki.archlinux.org/title/Intel_graphics#Testing_the_new_experimental_Xe_driver

As with all things ffmpeg, you may need to compile a version with the options you want (–enable-libfdk-aac anyone)

gc71 · March 28, 2025, 3:49am

I’m in the middle of putting together a Dockerfile right now with AV1 compiled in ffmpeg for CPU, Nvidia, and Intel encoding

I have not actually been able to test it yet because I need to finish upgrading and updating my server to be able to do AV1 at the hardware and kernel level, but the Docker builds successfully so far. I’ll be using it myself for various AV1 tests with the Intel Arc A310, Nvidia RTX 2000 Ada, and then compare with CPU based transcode as well.

Kingdud · March 28, 2025, 4:13am

A few things:

Hardware encoding is more for streaming / on-demand conversion rather than archival quality. The tradeoff that makes GPUs so fast at encoding, apart from massively parallel hardware, is far, far less precision in the floating point units, which are vital to getting high quality encodes. This also happens with h264/265 with nvenc. CPU encodes always look better and are smaller when using the same settings.
AV1, as far as I’ve read from everyone who has ever tried with any medium, is for streaming, not for archival video storage. Meaning, even if you set it to be lossless, it will look worse than h265 and have a bigger file than h265 while looking worse. The entire algorithm is rigged to be ‘good enough, quickly, with low bandwidth’ rather than ‘high quality, ever’.

wallacebw · March 28, 2025, 1:10pm

Agreed. my use case is streaming (real-time transcoding) not archival. My source library is a mix of h264 and HEVC mainly which were transcoded mostly on CPU or with NVENC (at a higher quality level) if I was willing to sacrifice time for size.

Honestly, with the cost of spinning rust as low as it is and a ARC-B GPU being <$300, I’m moving to higher quality / bitrate storage and when i’m ‘off-lan’ the ARC can transcode to AV1 realtime for 4-6 concurrent sessions (4k source) .

My $0.02 on quality: I have a calibrated monitor and I’ll determine ffmpeg settings by.

Encoding the source to a quality where I do not believe I can distinguish it from the original.
creating a split screen encode using the source for the left half and the recode for the right using ungodly high quality settings to minimize impact to sources.
Ex: ffmpeg -i video1.mkv -i video2.mkv -filter_complex "[0:V:0]crop=960:1080:0:0[v1];[1:V:0]crop=960:1080:960:0[v2];[v1][v2]hstack=2[out]" -map "[out]" ...__ENCODER_SETTINGS__... output.mkv
Watch the splitscreen and see if I can spot the differences from left to right.
Repeat until I find the settings that create the most efficient settings where I can’t distinguish the difference testing with the usual suspects (fast motion, diagonal lines, fog/mist, overly complex scenes ‘confetti or similar’.
use the results for archival.

Kingdud · March 28, 2025, 7:27pm

Yeah, I did the same thing with my h265 encodes. Except instead of split screen I would find scenes with a lot of details (the pores on Lawrence Fishburn’s face in The Matrix, foggy scenes, cloud scenes, stuff like that) and then try different encode settings until I couldn’t pixel peep a difference. Oh, right, I’d use ffmpeg to pull frames from the exact same time in the videos for both source and encode.

Once I had my settings I made a spreadsheet with power cost vs storage cost. Power cost because I did my quality tests for all presets from veryslow to faster. I realize that even using SSDs to store my movies veryslow took so long to encode that I was spending more in power than I was saving in storage space.

But our use cases are different, I’m going for archival. saving the raw 25g movies is just a PITA. I can reliably get 7-13gb 265 encodes that are visually indistinguishable (in pixel peeping) to the source videos. I keep the raw rips on spinning rust (set on a shelf, not powered up) just to save me the time of ripping them from disk again.

gc71 · March 28, 2025, 7:40pm

worth mentioning that “storage cost” stops being a relevant factor in the equation once you can no longer increase storage. My case has 11 / 13 HDD bays filled and the last two slots are not idea to use so my only real option would be larger drives, which also is not feasible since I am already using 20TB sized hdd

Kingdud · March 28, 2025, 8:03pm

If you have that many drives, just buy a JBOD chassis (60-120 drive capacity) and an HBA. Swap out the fans for something with FAR lower RPM to keep the noise down.

gc71 · March 28, 2025, 8:08pm

thanks but you are assuming i have the space and home infra to support something like that lol

Kingdud · March 28, 2025, 8:11pm

I’m…kinda not. A JBOD shelf is about the size of a full tower desktop, and doesn’t ‘need’ a rack. It just needs the front and back to be open to the air. It can sit on its side/upside down just fine. Power might be an issue though. 60 drives do take a lot of current. And the only two cables coming out of a JBOD shelf are the HBA cable and the power cable(s). Your PC/desktop/nas (whatever the HBA was in) would handle the networking and everything else. I get it though, not an option. You do you.

The_Riddick · March 29, 2025, 1:19am

Oh yeah I got a Arc380 like year ago, did a BUNCH of 4k+1080p streaming (AV1+264) to test it out in multi-stream.
Worked like a boss, and according to the GPU encode usage under Linux I could have ran at least another 1x 1080p steam; the 4k encodes are up on youtube still I think as Gerarderloper for cyberpunk/anomaly/dragonsdogma2, I think they turned out good.
Twitch sadly deletes all vods after 60days or something.

I did have a couple early stream issues which I sorted out.
I think its a COMPLETE waste of money buying a NVIDIA card for obs server encoding only these days iyam. But if your very well off then sure no doubt NV encode still does a bit better.

NightMoose · August 6, 2025, 5:13pm

Sorry to necro an old thread.

Does anyone have information if the B580 has better encoding capabilities compared to the A750? I’m debating swapping the one I have or just saving more money for an AM5 setup since they seem to handle AVX-512 better and use software encoding.