Why capture it again? The client will be open source, you could literally just take the frames from there as they come in.
obs plugin for that should be easy to write
never looked at their plugin system
but how hard can it beā¦
by the way, i offer myself for betatesting, i have a 1950x and 2 vega 64
and run manjaro / windows in dualboot or virtualized
Yes, if you want to stream straight into OBS it would be possible without too much work. Their code base is very modular, it should be possible to write a capture interface that implements the same interface the client program I have written here does.
Great, the offer is appreciated but at this point we have to keep the numbers of testers low, there is code in the tree that could cause legal issues if it were to be released (Blame NVidia). This is the āunholyā stuff Wendell mentioned.
This would be too confusing, DWARF is a commonly known and used linux binary format for debugging information.
I like the name āReflectionā at this point
Very, very interested in this project. Iāll look at the gofundme tomorrow to see if I can help monetarily. Thank you for your work on this. If you need more testers I might be able to help out.
Iāve been running my passthrough setup for a year and a half or so now. This looks revolutionary. I was already planning a Threadripper system in the coming months, but this project makes me want to get it done faster. Darn money.
Youāre welcome, every little bit helps! A huge thanks to everyone so far that has donated, I am speechless at the generosity of those in this community, it really is appreciated.
Get hyped! Awesome work, canāt wait to see where it goes.
Out of curiosity, Iāve been playing with some mellanox fiber cards, since this is DMA based would it be possible to adapt the same methodology to RDMA for passing the frame buffer over a network? That said, keep up the amazing work, this brings a whole new ballgame to VFIO!
I have no experience with that technology. This isnāt specifically DMA based, Wendell is stating that these transfers are nearly cost free because the CPU is using DMA to perform the transfers.
There is actually a single CPU based memory copy in the pipeline at the moment so itās not 100% offloaded to DMA at current.
was also wondering about this
Noā¦ there are already LAN based solutions such as āSteam In House Streamingā which are specifically designed to attempt to balance the latency penalty incurred with available network bandwidth by use of compression.
This project is targeting same host only at this point in time. It would be possible to implement something similar to what Steam does, but it is not a goal of this project.
Fastest general ethernet a home user has access to, say 10Gbps = 125 MB/s
Local SSE3 memory copy (on my Ryzen system) = 108Gbps = 13.5GB/s
Ah ok that makes sense, do you plan on moving to using DMA directly or mainly leaving it up to the cpu to manage? Also RDMA is essentially just DMA over wire, skipping any encoding/compression or standard TCP overheads, though its generally only supported over fiber lines (RoCE is a whole other can of worms).
It depends on hardware :)ā¦ It may be possible to hand a buffer to the guest to DMA directly into, we are yet to experiment with this. Priority at the moment is making it work in itās current inception. It is certainly on the list of things I want to do though.
Can it achieve 530MB/s? That is the minimum required for uncompressed 32bit RGBA 1920x1200 at 60Hz. If so then it would be a very interesting thing to play with
And before anyone says, why RGBA and not RGBā¦ itās faster even though there is more data, the video cardās native internal format is RGBA.
Actually quite easily, I currently use the mellanox connect-x 2 and i can stream data to and from my NVME based NAS at around 900MB/s to 1GB/s, it should in ideal circumstances be able to reach 10Gb/s sadly mine is sitting in a PCI-E 8x slot that can only hit 6x due to me running out of available lanes .
Youāre making me lose focus this sounds very cool! Definitely something to look into in the future.
Itās only about 20-30MB of data per frame at 4k, and modern CPUs are very effective at copying, shouldnāt take more than 10/20/30 uS with just regular memcpy
(SSE might actually make things slower given how Intel does loop stream detection and Iām guessing so does AMD).
A good pattern to follow to minimize this cost later down the road is to treat buffers immutable once populated, at the cost of some ram in order to minimize on locking and thus minimize on cross core chatter that needs to happen every time you lock/unlock something. Think buffer pool.
Haha
Definitely something for the future. Mainly occurred to me as it would be very compelling in serving thin clients in the enterprise space (make VNC look like a relicā¦oh wait) as well as running circles around any āin home streamingā service.
Not really, I have performed extensive benchmarking and rolled a SSE3 memcpy that performs prefetching. On average it is 4-8% faster then standard system memcpy.
We already do this
Thereās been plenty of research into VNC like streaming over the years, Iād be happy to see the VM framegrabber thing working first.
For sure, though I feel its a bit stalled out at the moment with the current glut of h264 based streamers being throw out into the wildā¦ As such I will say the idea of using direct memory maps was a total stroke of brilliance.