Thunderbolt/PCIe Latency Benchmarks/Analysis

Hi @wendell,

have you ever looked into Thunderbolt/PCIe latency?

With the desktops still using a dedicated TB controllers behind the chipset and Tiger Lake in notebooks having the controller integrated into the CPU die (not the “chipset” die, at least 1 less hop), I am guessing there should be a measurable difference.

And I am just curious if one could measure it in software and if the difference is large enough to affect, say eGPU performance.
In the past many reviewers have put eGPU slow-down purely on the limited bandwidth, without discussing latency. But if there is a difference in latency, than I would expect this to be a latency problem as well, if not primarily.
Cyberpunk for example uses only 5%-7% of available PCIe bandwidth (x16 4.0) in game according to GPU-Z and that is with hybrid graphics, where the frames need to be transferred back through PCIe as well. Same with with my notebook with only x8 3.0 connection for GPU (and ~30% of the PCIe bandwidth with the same display, albeit with slightly less FPS, so seems somewhat proportional and reliable).

For benchmarking I was thinking along the lines of CUDA, where some of the sample applications already do related measurements (memory-2-memory latencies) that one could maybe use as a basis, but I am not familiar enough with CUDA to come up with an actual benchmark nor do I have the collection of HW to actually test it…