Windows 11 not properly handling tagged packets

bambinone · August 16, 2022, 4:35am

Hi folks,

I have a MikroTik CRS305-1G-4S+ that’s set up for a few tagged VLANs and a PC with an onboard Marvell FastLinQ Edge 10GbE NIC based on the AQC113C controller. The PC is set up to dual boot Linux and Windows 11. In Linux, I’m using a few of the tagged VLANs (1, 10, and 100); in Windows, I only care about one of them (100).

In Windows, I set the VLAN ID to 100 on the NIC in Device Manager. (Driver version 3.1.3.0.) I can run an iperf3 test sending data TO the server:

iperf-3.1.3-win64> .\iperf3.exe -c 10.10.10.4
Connecting to host 10.10.10.4, port 5201
[  4] local 10.10.10.9 port 49588 connected to 10.10.10.4 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   870 MBytes  7.29 Gbits/sec
[  4]   1.00-2.00   sec   842 MBytes  7.06 Gbits/sec
[  4]   2.00-3.00   sec   838 MBytes  7.03 Gbits/sec
[  4]   3.00-4.00   sec   841 MBytes  7.05 Gbits/sec
[  4]   4.00-5.00   sec   846 MBytes  7.09 Gbits/sec
[  4]   5.00-6.00   sec   847 MBytes  7.11 Gbits/sec
[  4]   6.00-7.00   sec   841 MBytes  7.05 Gbits/sec
[  4]   7.00-8.00   sec   847 MBytes  7.11 Gbits/sec
[  4]   8.00-9.00   sec   840 MBytes  7.05 Gbits/sec
[  4]   9.00-10.00  sec   837 MBytes  7.02 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  8.25 GBytes  7.09 Gbits/sec                  sender
[  4]   0.00-10.00  sec  8.25 GBytes  7.09 Gbits/sec                  receiver

iperf Done.

But I can’t seem to get data back FROM the server:

iperf-3.1.3-win64> .\iperf3.exe -c 10.10.10.4 --reverse
Connecting to host 10.10.10.4, port 5201
Reverse mode, remote host 10.10.10.4 is sending
[  4] local 10.10.10.9 port 49344 connected to 10.10.10.4 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.02   sec  0.00 Bytes  0.00 bits/sec
[  4]   1.02-2.01   sec  0.00 Bytes  0.00 bits/sec
[  4]   2.01-3.01   sec  0.00 Bytes  0.00 bits/sec
[  4]   3.01-4.00   sec  0.00 Bytes  0.00 bits/sec
[  4]   4.00-5.01   sec  0.00 Bytes  0.00 bits/sec
[  4]   5.01-6.00   sec  0.00 Bytes  0.00 bits/sec
[  4]   6.00-7.01   sec  0.00 Bytes  0.00 bits/sec
[  4]   7.01-8.00   sec  0.00 Bytes  0.00 bits/sec
[  4]   8.00-9.01   sec  0.00 Bytes  0.00 bits/sec
[  4]   9.01-10.01  sec  0.00 Bytes  0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.01  sec   140 KBytes   114 Kbits/sec    5             sender
[  4]   0.00-10.01  sec  0.00 Bytes  0.00 bits/sec                  receiver

iperf Done.

I feel reasonably confident that the switch and server are set up correctly because the same test works fine in Linux with the tagged interface:

$ iperf3 -c 10.10.10.4 --reverse
Connecting to host 10.10.10.4, port 5201
Reverse mode, remote host 10.10.10.4 is sending
[  5] local 10.10.10.9 port 32960 connected to 10.10.10.4 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   626 MBytes  5.25 Gbits/sec                  
[  5]   1.00-2.00   sec   616 MBytes  5.17 Gbits/sec                  
[  5]   2.00-3.00   sec  1.06 GBytes  9.07 Gbits/sec                  
[  5]   3.00-4.00   sec  1.03 GBytes  8.85 Gbits/sec                  
[  5]   4.00-5.00   sec  1.02 GBytes  8.77 Gbits/sec                  
[  5]   5.00-6.00   sec  1.03 GBytes  8.82 Gbits/sec                  
[  5]   6.00-7.00   sec  1.03 GBytes  8.82 Gbits/sec                  
[  5]   7.00-8.00   sec  1.03 GBytes  8.86 Gbits/sec                  
[  5]   8.00-9.00   sec  1.01 GBytes  8.66 Gbits/sec                  
[  5]   9.00-10.00  sec   819 MBytes  6.87 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  9.22 GBytes  7.92 Gbits/sec  9988             sender
[  5]   0.00-10.00  sec  9.21 GBytes  7.91 Gbits/sec                  receiver

iperf Done.

So I’m at a bit of a loss here. I can configure the switch to un-tag VLAN 100 on egress for this port but Linux gets a little manic when you try to mix tagged and un-tagged traffic on the same interface and start creating bridges, etc. on top of it. I’m hoping there’s some deep Windows magic that will make this Just Work^TM. Thanks in advance!

MadMatt · August 16, 2022, 12:59pm

You sure it’s not a problem with iperf?
Can you run an iperf3 from the win machine to the linux server?
What about pings and tcp packets? are they going through properly?

bambinone · August 16, 2022, 3:22pm

Thank you for responding!

No, but it’s not the only thing that’s broken. What I originally noticed was that I can connect to the SMB server on 10.10.10.4 but directory listings won’t load. It seems like data transfer from the server back to the client is getting dropped somewhere along the line.

Yes. The first example in my OP shows this.

Ping works fine, both ways. (After enabling ICMP Echo in Windows Firewall.)

I think iperf3 is TCP by default. Switching it to UDP mode gives the same results (can transfer TO the server, can’t transfer FROM the server).

MadMatt · August 16, 2022, 3:59pm

Hmm, these three examples look to me like runs from the box where you are doing the vlan tagging (client) to a linux box (server) … i.e the direction is the same, but the binary is different
What I meant was to set up iperf3 on the windows box (10.10.10.9) in server mode, and run iperf3 in client mode from the linux machine (10.10.10.4) without the reverse option …
Have you tried disabling the windows firewall completely just to rule it out?

bambinone · August 16, 2022, 4:19pm

$ iperf3 -c 10.10.10.9
Connecting to host 10.10.10.9, port 5201
[  5] local 10.10.10.4 port 48072 connected to 10.10.10.9 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   107 KBytes   875 Kbits/sec    2   1.43 KBytes
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.43 KBytes
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.43 KBytes
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.43 KBytes
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   1.43 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   107 KBytes  87.6 Kbits/sec    5             sender
[  5]   0.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  receiver

iperf Done.
$ iperf3 -c 10.10.10.9 --reverse
Connecting to host 10.10.10.9, port 5201
Reverse mode, remote host 10.10.10.9 is sending
[  5] local 10.10.10.4 port 48076 connected to 10.10.10.9 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   856 MBytes  7.18 Gbits/sec
[  5]   1.00-2.00   sec   829 MBytes  6.95 Gbits/sec
[  5]   2.00-3.00   sec   825 MBytes  6.92 Gbits/sec
[  5]   3.00-4.00   sec   834 MBytes  6.99 Gbits/sec
[  5]   4.00-5.00   sec   828 MBytes  6.94 Gbits/sec
[  5]   5.00-6.00   sec   823 MBytes  6.90 Gbits/sec
[  5]   6.00-7.00   sec   829 MBytes  6.95 Gbits/sec
[  5]   7.00-8.00   sec   824 MBytes  6.91 Gbits/sec
[  5]   8.00-9.00   sec   838 MBytes  7.03 Gbits/sec
[  5]   9.00-10.00  sec   821 MBytes  6.89 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  8.11 GBytes  6.97 Gbits/sec                  sender
[  5]   0.00-10.00  sec  8.11 GBytes  6.97 Gbits/sec                  receiver

iperf Done.

No change.

MadMatt · August 16, 2022, 6:33pm

Weird, and I am assuming it works as expected if you untag the vlan and use the trunk VLAN/configure it on the switch as access port?

bambinone · August 16, 2022, 7:08pm

Yes, works fine with untagged traffic to Windows, and works fine with tagged traffic in Linux.

MadMatt · August 16, 2022, 7:58pm

Ok, does the NIC driver give you the option to disable Hardware offloading (maybe tx, rx)?

bambinone · August 16, 2022, 8:22pm

No offload options related to VLAN tagging, but some of the usual suspects like ARP, checksum, large send, etc. I tried disabling them one at a time… no change.

bambinone · August 17, 2022, 3:27am

Welp, I still don’t really understand what’s going on, but here’s my interim solution.

First, I created a new dummy VLAN 1100 on the switch with the ACQ113C port as its only member. I set the PVID of the ACQ113C port to 1100; this causes the switch to strip packets tagged with VLAN 1100 and send them un-tagged to the port, which is what Windows apparently needs. In Windows, I set the VLAN ID of the interface to 1100 and set a different IP address than the one I plan to use in Linux. This causes packets originating from Windows to get tagged with VLAN 1100. These changes give me some unique conditions to work with.

Next, back on the switch, I created two ACL rules. Traffic coming from the other switch ports, matching Windows’ unique IP DST, get forced to VLAN 1100. Traffic coming from the ACQ113C port, tagged with VLAN 1100, get forced to VLAN 100. Et voila.

Linux isn’t aware of the dummy VLAN 1100 and uses a different IP address on VLAN 100, so the ACL rules will never fire. Tagged traffic gets sent back and forth like normal on VLANs 1, 10, and 100. Untagged traffic and anything else coming down the wire gets ignored.

I’m sure a proper network engineer would be able to figure out a more elegant solution but this works for the time being and it’s better (IMNSHO) than dealing with mixed tagged and untagged traffic in Linux.

scyto · October 29, 2024, 12:21am

did you ever solve this?

i have ubiquiti telling me the reason why my windows is responding to trafic on the wrong VLAN is:

We suspect that traffic for VLAN 1 (default ) is TAGGED off the switch port but Your Windows NIC drivers just STRIP AWAY that tag presenting this traffic as UNTAGGED (thus in RX direction You have two vlans 4 an 1 as untagged ) confusing the Windows OS…

the interface in question is indeed an ACQ113 based card