A little teaser of what is to come :)

Why capture it again? The client will be open source, you could literally just take the frames from there as they come in.

2 Likes

obs plugin for that should be easy to write
never looked at their plugin system
but how hard can it be…

by the way, i offer myself for betatesting, i have a 1950x and 2 vega 64
and run manjaro / windows in dualboot or virtualized

1 Like

Yes, if you want to stream straight into OBS it would be possible without too much work. Their code base is very modular, it should be possible to write a capture interface that implements the same interface the client program I have written here does.

Great, the offer is appreciated but at this point we have to keep the numbers of testers low, there is code in the tree that could cause legal issues if it were to be released (Blame NVidia). This is the “unholy” stuff Wendell mentioned.

3 Likes

This would be too confusing, DWARF is a commonly known and used linux binary format for debugging information.

I like the name “Reflection” at this point :slight_smile:

1 Like

Very, very interested in this project. I’ll look at the gofundme tomorrow to see if I can help monetarily. Thank you for your work on this. If you need more testers I might be able to help out.

I’ve been running my passthrough setup for a year and a half or so now. This looks revolutionary. I was already planning a Threadripper system in the coming months, but this project makes me want to get it done faster. Darn money.

1 Like

You’re welcome, every little bit helps! A huge thanks to everyone so far that has donated, I am speechless at the generosity of those in this community, it really is appreciated.

2 Likes

Get hyped! Awesome work, can’t wait to see where it goes.

Out of curiosity, I’ve been playing with some mellanox fiber cards, since this is DMA based would it be possible to adapt the same methodology to RDMA for passing the frame buffer over a network? That said, keep up the amazing work, this brings a whole new ballgame to VFIO!

1 Like

I have no experience with that technology. This isn’t specifically DMA based, Wendell is stating that these transfers are nearly cost free because the CPU is using DMA to perform the transfers.

There is actually a single CPU based memory copy in the pipeline at the moment so it’s not 100% offloaded to DMA at current.

was also wondering about this :slight_smile:

No… there are already LAN based solutions such as “Steam In House Streaming” which are specifically designed to attempt to balance the latency penalty incurred with available network bandwidth by use of compression.

This project is targeting same host only at this point in time. It would be possible to implement something similar to what Steam does, but it is not a goal of this project.

Fastest general ethernet a home user has access to, say 10Gbps = 125 MB/s
Local SSE3 memory copy (on my Ryzen system) = 108Gbps = 13.5GB/s

4 Likes

Ah ok that makes sense, do you plan on moving to using DMA directly or mainly leaving it up to the cpu to manage? Also RDMA is essentially just DMA over wire, skipping any encoding/compression or standard TCP overheads, though its generally only supported over fiber lines (RoCE is a whole other can of worms).

It depends on hardware :)… It may be possible to hand a buffer to the guest to DMA directly into, we are yet to experiment with this. Priority at the moment is making it work in it’s current inception. It is certainly on the list of things I want to do though.

Can it achieve 530MB/s? That is the minimum required for uncompressed 32bit RGBA 1920x1200 at 60Hz. If so then it would be a very interesting thing to play with :slight_smile:

And before anyone says, why RGBA and not RGB… it’s faster even though there is more data, the video card’s native internal format is RGBA.

1 Like

Actually quite easily, I currently use the mellanox connect-x 2 and i can stream data to and from my NVME based NAS at around 900MB/s to 1GB/s, it should in ideal circumstances be able to reach 10Gb/s sadly mine is sitting in a PCI-E 8x slot that can only hit 6x due to me running out of available lanes :frowning: .

1 Like

You’re making me lose focus :laughing: this sounds very cool! Definitely something to look into in the future.

1 Like

It’s only about 20-30MB of data per frame at 4k, and modern CPUs are very effective at copying, shouldn’t take more than 10/20/30 uS with just regular memcpy
(SSE might actually make things slower given how Intel does loop stream detection and I’m guessing so does AMD).

A good pattern to follow to minimize this cost later down the road is to treat buffers immutable once populated, at the cost of some ram in order to minimize on locking and thus minimize on cross core chatter that needs to happen every time you lock/unlock something. Think buffer pool.

Haha :rofl:
Definitely something for the future. Mainly occurred to me as it would be very compelling in serving thin clients in the enterprise space (make VNC look like a relic…oh wait) as well as running circles around any “in home streaming” service.

2 Likes

Not really, I have performed extensive benchmarking and rolled a SSE3 memcpy that performs prefetching. On average it is 4-8% faster then standard system memcpy.

We already do this :slight_smile:

1 Like

There’s been plenty of research into VNC like streaming over the years, I’d be happy to see the VM framegrabber thing working first.

For sure, though I feel its a bit stalled out at the moment with the current glut of h264 based streamers being throw out into the wild… As such I will say the idea of using direct memory maps was a total stroke of brilliance.