I’m running into a bit of a perplexing issue to say the least with, what I suspect is, Windows.
Summary:
I’ve got two 25GbE NICs connected to a 25GbE capable switch. Links negotiate at 25GbE and uploads from Windows to the remote servers pass at 25GbE with iperf2 and iperf3 (about 24Gb/s with iperf2 and 21 with iperf3) but downloads are the issue from the remote server to the Windows machine. No matter what I’ve tried I can’t seem to break some mystical sub-13Gb/s download barrier and I’m kind of lost as to why.
What I’ve tried:
I’ve tried disabling all Windows security features (Firewall, anti-virus, etc) and no real change, I’m capped at ~12.5Gb/s downloads to the host.
Reinstalling NIC drivers and uninstalling drivers/rebooting/reinstalling but no change, still ~12.5Gb/s downloads but no issues hitting 25Gb/s uploads.
Testing across different VLANs and on the same VLAN as the VM and no change either.
Used two Linux (Ubuntu) VMs on the same ESXi host but in different VLANs so that they need to cross the switch work in both directions without issue and hit 24.5Gb/s both ways without issue so something seems to be up with the Windows machine. I see no output drops on the switch and everything looks normal so that’s what has me suspecting a Windows issue.
No CPU bottlenecks on either end, usage is sub 10% throughout testing.
I’ve not tried safe mode with networking just yet but that’s next on my list along with spinning up a VM of Windows 11 and/or Server 2019 or 2022 and testing there as well just to see if it’s down to the install on this machine or not at least.
Hardware:
NIC: Intel XXV710
Switch: Catalyst 9300X-48HX with 8x25GbE module.
MTU set to jumbo using 9000 end to end.
Machines:
ESXi 8.0u2 with Ubuntu Server 22.04.3 VMs
Physical install of Windows 11 Pro
Tested with iperf 2 and 3 using -P 4 and -P 6 makes no difference.
Is this just in iperf or are we talking real world workload? I’m a bit confused about your upload/download usage because you are switching perspective multiple times…
Might be your storage. NVMe drives get slow after sustained writes. Use a RAMdisk to rule this out.
Or create a Linux VM with NIC and storage passthrough on the Windows machine. Rules out network and Windows-related things.
would be my approach. More or less exactly half of 25Gbit smells fishy…but could be a coincidence.
Most of testing so far in iperf but I did test some other workloads today and they seem to be pretty close as well to what iperf seems to be stuck on as well.
Definitely going to look at a live linux or a VM on the windows machine as well, good idea.
I did check on that as well, both are in slots in x8 mode since they’re PCIe 3.0x8 cards. I did triple check and both machines have the respective cards in a slot that’s electrically x8 or x16. I did have one card in an x4 slot originally but changed the layout to rule that out.
Check from the powershell / terminal that MTUs are actually matching. Ran into a similar issue using the same NIC where Windows GUI Settings for MTU didnt apply correctly.
So far I’ve narrowed it down to something in Windows. I rebooted into Safe Mode with Networking and ran iperf3 and iperf2 and got 24.5Gbit/s in each direction so some non-critical service is still hanging around because when I boot back into normal Windows I’m back although now at about 14.5Gbit down and still 24Gbit to the Linux box upload wise.
Turns out there was a pesky webdefendersvr and webdevenderusersvc running even when Defender was disabled via the GUI which was the culprit. Stopped those two services and boom, no issues.