I mean widely used in post production. FFV1 is only widely used in archival, and NOBODY uses FFV1 even in open source cinema cameras to capture. (Apertus for example)
At this point I think itâs better for direct FPGA access to a piece of storage like a PCI-E 16x Intel Optane. The data from the Optane would then be encoded out with the Optane as a buffer.
FPGA + ARM ASIC for encoding is what Iâd go down, but to get more storage, you kinda have to go with Optane. Gives more buffer in case the ASIC has to slow down for complex scenes.
Excuse my ignorance here. I am only slowly grasping what you two are discussing here. Lots more reading to do.
The big PCIe FPGAs (U2x0 line) from Xilinx can handle 64GB onboard RAM (at 70GB/s bandwidth). Would that be sufficient buffer or is that cutting it too close?
Edit:
This is what you intend to make happen, right?
Amazon EC2 F1 instance (which offers Xilinx UltraScale+ VU9P FPGA)
Thatâs the second FPGA that reads from a large cache like a Intel Optane.
Weâre seeing if we can compress to JPEG XS first using a GPUDirect or DirectDMA connection that captures NVFBC or similar of the desktop, then that uncompressed feed goes to a JPEG XS FPGA that compresses to JPEG XS as a cache to an Intel Optane, then another FPGA reads from Optane and does the final compression to AV1, where it can tweak itâs speed to always be in line with the cache with a very big lookahead buffer to optimize bitrate with more available video to sample from.
It can allow multiple passes of the cache if itâs big enough and the encoder is fast enough. JPEG XS can fit in the bandwidth of a Optane but itâs the out of sequence reads thatâs the important part for multiple passes.
The end result would be low enough bitrate a spinning HDD would be fast enough to write to if itâs AV1. If it was the professional profiles, a typical NVMe like the SK Hynix Gold P31 would be fast enough.
This is less real-time streaming, than recording with multiple passes using a cache.
You donât need to make it sound like itâs uber-elite talk, although it kinda is
Depending on what codec to use and on what kind of hardware. I donât know every detail about every codec and there is nothing new about using a large framebuffer for encoding. Although it is kind of new. The newish part would be doing it on a ram or memory disc thus having the frame-buffer in a zone on the disc, if not even using the entire disc as a buffer aswell. Which is probably something similar to what jpeg-turbo does.
Itâs what my friend Davidâs high speed camera does. Bayer data is stored in RAM, then it plays the data at 60fps out to the ARM H264 encoder.
Last I heard he hasnât yet figured out faster than realtime playback of RAM to the encoder yet.
Thatâs the point of Jpeg-turbo. Although the loss of frames are way to high, but even then it can provide a steady 30fps. There are probably codecs like it, although there should be way many more. Because the technical aspects of using memory or harddrive allows for so much more than simply pushing through frames with cpu cores which is way to linear way to do such a task. Itâs actually kind of borderline linear to be a bit more honest.
If anyone looks at the tech sheets of both what ram and harddisc tech can provide, then looking at cpu cores after. Using cpu cores for encoding, would be the most stupid choice of the other options if the codec that is used is x264 and similar codecsâŚ
Yes that is similar to codecs such as jpeg-turbo. Encoding in ram first then using the cpu on the stored / compressed frames. Where something like an ARM cpu, would be an amazing choice for that. Although again the issue with storing on ram, is the amount of frames that are lost when compressed with jpeg-turbo codec.
So Jpeg-turbo, yes basically taking jpegâs in a fast order. Like superfast screenshots that are then compressed before being processed by a cpu-based codec like x264 (example) Others are also available.
Not sure where the loss of frames occurs, perhaps when compressing frames? It happens on another level of code, where a cpu isnât used to encode the frames which could be the reason alone for the massive loss of frames. Very likely, as using the cpu is a very basic part way of codec usage. In general.
It has to be admitted though, that codecâs that does not use cpu to compress frames, work on another level of code. Itâs pretty amazing stuff going and a so much much more smarter way to do what a only cpu-based codec would normally do.
Although it sounds more likely that the losses occur when they are being processed from the compression. Perhaps even both, if not before processing (at the encoding part) Not 100% sure, about that it would just sound much more likely that losses are most viable from the encoding part of the compression as there isnât a cpu to save the frames that are being compressed, instead the cpu uses already compressed frames that it can then process.
So youâre trying to find which framebuffer is dropping frames. This is the same issue DXGI and NVFBC face when passing through FFmpeg or OBS. They pretty much have to all be in Vsync to avoid dropping frames.
In fact, Internal triple buffered Vsync, then capturing externally with perfect Vsync is the best way to capture because youâre allowing triple buffering to happen, then capturing the buffered output to a compressed format is pretty much the best way to go. Itâs why Dual System is so popular.
When grabbing an internal framebuffer, there is the likelihood itâs before any buffering has occured. NVFBC is likely the closest to capturing Vsynced single buffer, but even it has a sync offset if your monitor and capture framerates donât precisely match. This is why people love dual system.
I think there is confusion in how you are wildly mixing terminology between codecs (encoder/decoders built in software, FPGA, or ASIC hardware) and the compression standards that those codecs implement.
H.265 == HEVC == MPEG-H Part 2
The compression standard that is being used for 4K+ video, as well as for lower resolution video since it is an improvement on the previous standard compression format, H.264/AVC/MPEG-4 Part 10.
x265
An open-source software encoder/decoder implementing the H.265 compression standard/format. As far as I can tell, x265 runs only on the CPU, but in theory, one might be able to use OpenCL/CUDA/Vulkan to offload some software calculations to the GPU.
NVENC (Nvidia Encoder)
How Nvidia refers to the portion of its GPU that contains one or more encoding ASICs (the decoding ASICs are separately grouped as NVDEC); from Nvidiaâs comparison table, it looks like 5th gen of NVENC was the first to include an H.265/HEVC encoder, though it required 4:2:0 chroma subsampling.
Additionally, when you talk about codecs using memory in a special way, it almost sounds as though you are referring to Processing In Memory (PIM), where the RAM DIMMs themselves do some pre-processing (alternatively called C-RAM for Computational RAM); however, PIM is not currently available in any meaningful way, certainly not for end users.
Maybe you are merely referring to using more memory to cache frames in some way before the compression is run on them, but that does not seem remarkable at all; I think on most CPUs in normal operation, one or two 4K frames will not fit entirely in L3, so you will be falling back to RAM whether you want to or not.
While I am mentioning somewhat confusing things, @FurryJackman seems to be talking about some kind of double-compression approach where the input device does a round of JPEG-XS encoding before handing the stream off to the HEVC codec.
Lossless input â JPEG-XS â[PCIe]â HEVC
I guess the intent is to reduce PCIe bandwidth requirements, but I do not see how this would be especially beneficial, since then the JPEG-XS data would need to first be decompressed to plaintext, then re-compressed to HEVC. Is the additional latency and quality loss from this approach really outweighed by the reduced PCIe bandwidth? I would suspect that this requires more memory as well, since you need a JPEG-XS decoder in front of the HEVC encoder, no?
Well using a larger memory pool would possibly allow for lessening the losses of frames in a cheap way. As the amount of already lost frames from compressed frames is very very high, if compared to a cpu doing the compression.
Personally donât get the part where itâs not remarkable, that itâs even possible to compress frames without using a cpu for the compression part of it. let alone getting + - 30 fps from it.
Even with massive loss of frames in, after or before the compression. The amount of memory it would need say versus sheer cpu power should be enough to impress anyone. As they are literally worlds apart in results, especially at super high resolutions where jpeg-turbo memory compression (even with massive losses) outsmarts every cpu there is, doing the same task.
It all happens in a secondary GPU, where it takes care of the decode, and can take itâs time with a massive read-ahead buffer to do rate control. JPEG XS barely requires any resources since itâs a lightweight compression, and most of the resources would be dedicated to the multiple passes encoding to a lossy format.
It doesnât have to be HEVC, it could be any other number of codecs CUDA could use itâs acceleration to both decode and encode at the same time using CUDA cores. (Not NVENC, thatâs a ASIC portion of a GPU)
This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.