Livestream: Headless PCIe Passthrough? Coming Very Soon | Level One Techs

wendell · November 25, 2017, 10:36pm

This is just a teaser, but I wanted to introduce Geoff and what is being worked on.

Companion Post: https://forum.level1techs.com/t/a-little-teaser-of-what-is-to-come/12164...

This is a companion discussion topic for the original entry at https://level1techs.com/video/livestream-headless-pcie-passthrough-coming-very-soon

misterk81 · November 25, 2017, 11:44pm

I wish I had the hardware to try this. But maybe I can purchase a secondary low cost GPU for this?

This is pretty exciting stuff. I have generally been using Linux since 2007 (kernel 2.40ish?), it was my primary operating system for 5-6 years straight. But in recent years I had to switch back to Windows, for various reasons. Would jump back to Linux pertinently if I could run Windows (even if it is just Windows 7) through a VM PCIe pass through.

As for names…

Level1sPassthrough
EZPC-y (like Easy Peasy?.. Oh wait, there is a Linux Distro called Easy Peasy)

I have nothing!

jamesstringerparsec · November 26, 2017, 3:59am

Hey Wendell, I would love for you to test this vs just using Parsec on the guest machine running headless. I have tested our latency and it adds only about 7ms on a local network for our entire pipeline, which judging by the video is less than what is displayed in your side by side video. Please note our client package is only available for Ubuntu (I’m trying to change that so we can get an Arch and Fedora build)

Here is an video where I do the testing. https://www.youtube.com/watch?v=dvGdNHx7alI (sorry it’s quite obnoxious)

I really love your work, please let me know if you need any more information.

Streetguru · November 26, 2017, 4:04am

@wendell For the time being are you already letting him like remote into the TR system? Or is that just not good enough.

Don’t suppose you could contact AMD to get a loaner system for him to fix the issue? It does help AMD a ton.

wendell · November 26, 2017, 4:13am

Looks really interesting! I watched the entire video. How’s it work? Fedora builds would be much appreciated.

The crossover monitor is slower than the dell monitor, by about 2 frames, and we’re experimenting with double buffering as a way of dealing with the desync that occurs between the “vsync” of the two different displays, which doesn’t help things. Not the visual aspect of it, but the time delay from copying a frame from the guest’s gpu to the hosts cpu, just the memory copy operation, is less than 1ms. It’s literally just the time that C takes for about 1 memcpySSE() operation on a buffer that’s about 10 megabytes big. Everything else is down DMA anyway, so technically doesn’t take that much in terms of cpu cycles.

parsec looks really, really interesting though, I think I saw it once before. I need to look at it

wendell · November 26, 2017, 4:14am

AMD is busy with other things. It’s on their radar but they’ve got a lot of growth to deal with. MSI is putting in a lot of time here and there is an agesa update coming, but there are also other reasons the AMD platform needs a little TLC around the virtualization stuff which I can’t get into here. Soon™ haha.

Streetguru · November 26, 2017, 4:15am

Need to find some sponsors to help…You need to be Linus.

gnif · November 26, 2017, 9:03am

Looks interesting but I can clearly see you are using YUV422 to reduce the bandwidth (color accuraccy suffers) and I would assume hardware compression, looks like NvFBC is in use here

If that is the case, the FPS of the host and clien’t don’t relate like you would think as NvFBC will only present a frame to stream if a change has occurred, you could show the host running at 1000fps, but if nothing changed on screen, the actual network framerate is very, very low. The only thing it will help with is a drop in latency as there are more opportunities to display a frame when there is a new frame to show (And let’s not kid anyone, the video card is very good at providing a difference map at no extra cost)

Edit: Actually looking closer you are streaming at 60FPS, but the game is running with VSync disabled pumping out 240+ FPS, which is also why there is tearing going on. We can do this too but opt not to in favour of video quality as the tearing makes desktop usage horrendous. I’d love to see some footage of full screen motion (pan left/right) at 240FPS rather then just a partial screen update as is shown in your demo.

The latency that is visible in Wendell’s video is due to several factors, one of which he has already stated, simple monitor lag. There is some other issue with the code operating on the host AMD GPU, which is why I am trying to get some AMD hardware for testing.

On my test PC here the latency is extremely hard to see, around 90% of frames are delivered on time giving an average of < 5ms of latency, the other 10% are up to 16.7ms out due to clock drift between the two cards. If we were to operate at 240Hz I would expect an average of 1-2ms and a maximum of 4.2ms.

gnif · November 26, 2017, 9:15am

Latency is the biggest issue, the video output needs to be recorded at high framerates to determine if there are problems during development. Since we are dealing with 32-bit RGBA at 1920x1200 @ a minimum of 60Hz, thats 527.35MB/s of data, impossible to stream.

Streetguru · November 26, 2017, 5:23pm

Make the machine stream the video output to twitch and watch the gameplay there? Could use something like Dota 2 and just have a replay play out.

jamesstringerparsec · November 26, 2017, 8:00pm

Hey!

Yea I believe with Vsync enabled it would certainly not be as fast, I could enable vsync on client but it do believe we would run into the same slow downs you and Wendell are getting just due to the 16.7ms frame time.

The test we did went thru 60hz, 120hz and 240hz, the yellow frame counter is FRAPS on the client machine, so that is the received frames to client. If I was to move to the desktop, the frame rate to the client would drop to less than 20 - because as you said, we only send when there is new data.

I could rig up a low fi test panning left and right - the monitors are just at home, as well as the camera!

You are correct about NVENC - as well as VCE and Quicksync. And yes 4:2:0, our app is focused on WAN over every other use case, so keeping bandwidth in check is crucial. The benefit with that is that you can remotely access it, or let a friend remotely access it.

I’m actually quite interested in how you are capturing frame buffer, as we require a minimum of a real display or hdmi headless dongle, if there is some way to avoid that but still have full gpu acceleration I’d be interested to know (on GTX/Radeon consumer cards).

Once again, really great work.

jamesstringerparsec · November 26, 2017, 8:12pm

Thanks for taking the time to watch!

I’m really excited to see how high frame rate improves user experience of remote connections, that 16.7ms frame time is now the biggest issue for latency. Locally 16.7 is fine, but it gets compounded when you add LAN, WAN, VSYNC etc etc.

We’re a little different as explained to @gnif in the way we’re focused on remote access (well, game streaming) as a priority, so our pipeline also adds encoding and nat traversal to the mix (if required).

I’d love to get your feedback on our app when you do get a chance!

gnif · November 26, 2017, 10:32pm

Our slowdowns are not due to vsync, we are having a redraw problem on AMD host hardware to do with persistent ARB_GL_buffer_storage. We are actually completely synchronized, just not drawing properly (likely a buffer flush issue), you can actually see this in the video where the blue square seems to tear and jump backwards, and this is with vsync. Once I get an AMD GPU I will be able to diagnose this.

Yup , well done though, for network streaming this is the best I have seen. I would like to add network capability to this program in the future too simple so our feature set is complete, but that will be a v 2.0 thing if it does happen. Also it is likely to not compete with your product as it would be based on something like RDMA transfers using fiber interconnects to remove the compression requirements, so very niche and expensive, but perfect for studio setups.

At this point we still require a monitor attached also, but I do hardware too so I am looking at building a software programmable monitor emulator that allows the EDID to be changed on the fly, this would allow us (and you) to run at any resolution. The cost in both time and money of development of this though at this time is prohibitive.

Thanks, really appreciate it

gnif · November 28, 2017, 7:24am

@jamesstringerparsec Here is a demonstration of the setup when things are working properly

Whizdumb · November 29, 2017, 8:30pm

@wendell You asked for name suggestions for this project in the video. So I just wanted to throw my hat in the ring.

“Virtigo” For VIRTually Initialized Graphical Output. Kinda cheesy but a fun play on words.