Quick summary of the problem:
Site A has subnets: 192.168.1.0/24 & 192.168.99.0/24
Site B has subnets: 192.168.4.0/24
Machines on the site A “99” subnet can ping/access site B “4” subnet.
Machines on the site A “1” subnet can NOT ping/access site B “4” subnet.
Details:
Both sites are pfSense router on CE 2.7.0 or newer. Bother have the the tailscale client installed and configured on them. I am using a self hosted Headscale container as the coordination server.
Followed these guides to setup Headscale server and Tailscale pfSense clients:
I think something is going wrong with my Outbound NAT, but I am not sure how to fix it. These captures are from the site A router:
For outbound NAT you need an entry for each gateway interface, so WAN plus your VPN interface(s), with a source of each of your local networks. You can probably just copy those WAN rules you already have and change the interface to your VPN interface.
You will also need to go to system>routing and add static routes for any networks which are behind the VPN.
I already created individual outbound NAT rules previously and it did not work. Also if that was required, why does this combined rule work for the “99” subnet?
As for routes, I don’t need to make static routes as tailscale is doing that already.
Focused on the issue subnet only.
If site A cannot reach site B BUT site B can reach site A, I would look at site B’s firewall rules.
With firewalls the general rule is:
Allow two way traffic if source from internal.
This would explain why site B works two way when initiating the traffic.
I don’t think it is a firewall rule issue on site B for these reasons:
Site A subnet “192.168.1.0” fails to reach site B, Site A subnet “192.168.99.0” succeeds.
If a ping is initiated by the Site A router it can reach site B using either subnets interface
There are other sites C and D. The problem is exactly the same in that Site A subnet “192.168.1.0” cannot reach remote sites, but Site A subnet “192.168.99.0” can.
I feel like have identified the root of the problem but people are asking me to look at everything but it:
My current theory is that this router config is many years old. It has gone through many upgrades and a bunch of changing configs with various VPN solutions included. Something has corrupted the configuration in a way that is not visible in the GUI, but still changing behavior. I will likely have to try a wipe/restore this weekend and see if the problem persists.
Edit: Slidermike, with superb tact, informed me that I was rude in this message. I agree and have apologized for my tone. We should all stive as wise, tactful, and level headed as they appear to be.
I spent the entire redoing my entire router setup and the problem is exactly the same. I give up. I did think of a way for my server to be able to send backups again: