This is what I am talking about.
In theory… it would need to be evaluated.
Ok, let me explain a little of the basics. Lets say you want to send a buffer that is 1MB in size, this is larger then the maximum payload supported by all current consumer and even most enterprise level equipment, so the TCP stack splits this up into packets of data limited by the MTU (maximum transmission unit) of your hardware.
Generally the default MTU is 1500 bytes unless you’re using Jumbo frames, in which case it’s usually 9000 bytes. The MTU may further be limited by your PC or any equipment between you and the target PC (routers/switches/gateways, etc).
Each packet has a IP header containing among other things, the source address, destination address, ttl, crc, flags and sequence number. Each of these fields need to be populated by the TCP stack in the kernel, some things like crc
can be offloaded to the ethernet hardware if it supports it (most do today).
At the end of the day, your 1MB packet, assuming jumbo frames had to be split up into 117 packets. Since your kernel is doing lots of things at once, it may re-order the packets and send them out in a jumbled up mess (worst case). As such the remote end needs to take each packet and inspect the sequence number to determine the original ordering. It also needs to check the CRC of each packet and if using TCP an acknowledge packet needs to be sent back to the sender to verify receipt. If the CRC doesn’t verify and can’t be corrected, a resend message is sent back and remote host re-transmits the packet.
Finally the kernel needs to reassemble the packet payloads into a contiguous buffer to hand to the userspace application. The userspace application then needs to check if it has the entire buffer yet (the kernel can’t know this, it’s protocol specific), and buffer each payload until it’s all arrived.
As you can imagine, this is a very complex stack to take large amounts of data through, and we have not even covered the fact that all this goes through a firewall layer and routing layers which adds additional computational complexity.
This is why LG at current has zero support for this, while all this is written into the kernel, the overheads it adds goes against the project’s goal of being as close to zero latency as possible.
Note: A technology like RDMA would make this feasible… but that is not usually even in the reach of even enthusiast users with massive budgets for high end equipment (including me… hint hint, someone send me some gear so I can learn it and add LG support :P)