Dual-WAN/SD-WAN - What are you guys doing?

NicKF · February 4, 2022, 5:08am

Hi all,

I’ve been working on building out my homelab environment for a very long time. I have been looking to solve a very specific problem I have with my current configuration and I was wondering how you all are dealing with it if you are in a similar place as I am. My homelab was started for two reasons: learning and Plex. Plex is still a critical service I maintain and I consider myself a lifelong learner.

Earlier this year I was able to get a Starlink subscription which I use in conjunction with my cable modem. Most of my traffic is statically going out either one or the other by design…but I want to better leverage my resources.

Afterwards I got OSPF setup on my core switch between itself and two dedicated pfSense firewalls (which are also dynamically routed between eachother, more on that later) for each WAN circuit. This creates a dynamically routed WAN edge that was designed so that if an internet connection was misbehaving, my network would simply route out the other connection automatically.

In my current configuration the Starlink path is statically set to have a higher path cost in OSPF, so my “normal” network traffic won’t egress out of Starlink unless the cable modem pfSense box is no longer advertising the default route. In theory, this should have been enough for failover, but I was having some crummy network performance from my cable modem side and a failover never occurred.

What I hadn’t considered is that since the connection of the L3 interface between my switch and pfsense was fine and stable, OSPF had no idea there were issues and continued to route packets in that direction. In otherwords, pfSense was still advertising the default route because it didn’t go down, it was just performing poorly and my dynamic routing implementation didn’t have a mechanism to handle it.

My next phase was the addition of two additional networks between each of the pfSense firewalls. One network had a gateway on my cable modem pfSense and the other network had a gateway on the Starlink pfSense. That, coupled with OSPF, allowed me to create dynamic routing paths between my normal network and the WAN edge through eachother. What this allowed me to do was leverage pfSense’s gateway groups and tiering functionality.

When packetloss or high latency hits a threshold, the cable modem pfsense will automatically change the default gateway from my cable ISP to Starlink. When that happens, my network traffic is automatically rerouted to Starlink, but through the cable modem pfSense box and then to the Starlink pfSense box. This works fairly well, and so far this is where my story ends.

The problem is, when this happens my external IP address changes. This causes clients with active sessions to things like streaming video, gaming, video chats, etc to freak out, and rightfully so! I’ve been looking at trying to find some flexible and cost efficient solutions at solving that problem.

One thing I’ve considered doing is leaving my existing topology exactly as it is, more or less, and creating a Wireguard tunnel to something like Linode from each of my pfSense boxes and forcing all traffic to egress out of my network, through the internet to Linode, and then using the IP address of my Linode instance for all of my services. Another thing I’ve considered is somehow using ZeroTier in a similar way, though I’m not sure how to implement it into my current topology.

The other option I have put serious thought into is co-locating a server (which I already own) somewhere, and then I can run off-site backups to it and have other services in addition to just being my public-facing IP address.

What are you all doing?

MadMatt · February 4, 2022, 6:53am

I use only one pfsense instance to do what you do (DSL+STarlink) and let pfsense load balance between the two (so no preferred gateway, but a gateway group with two egresses at the same tier), and use gateway monitoring on the pfsense to let it decide when a route needs to be brought out of service because of packet loss. No OSPF, no multiple gateways, works very well and transitions between the two services are seamless.

For Ingress, I have an OPenvpn link to a VM in Oracle cloud (free tier, no cost) and use Openvpn there to accept incoming encrypted traffic and send it over to my internal systems. This is as far as I know the only way to have ingress on Starlink as it uses CGNAT and does not expose a public IP ever … have been fiddling with IPV6 as from what I understand that works on starlink but have had no luck getting the renewals working, so I get a public IPV6 on starlink at boot, but it goes away after 5 minutes and every doc I could find to do a renewal hasn’t worked for me …

risk · February 4, 2022, 9:04am

Hmmm,

…

yep

…

I used to (however not currently) use:

OpenWRT+mwan3 for ISP failover
keepalived + conntrackd for firewall syncing (I was NAT-ing my home network behind a VIP, and my local gateway was another VIP, these would migrate in pair from one device to another when it was time to reboot the router).

Either ZeroTier or Tailscale, or static wireguard … all are easy peasy ways of making tunnels between your home gateway and cloud machines (e.g. load balance across two linodes? why not). MTU might be a concern but nothing that can’t be fixed with either mss clamping or properly firewalling (allowing select ICMP traffic) for pmtud to serve it’s purpose.

… at the moment I have a more pedestrian setup … just a single ISP, a debian box as a router. Although it’s running “testing”, and auto upgrading, and when there’s a new kernel it kexecs into new kernel and initrd and is back in action in about 20s.

NicKF · February 5, 2022, 2:10am

I started messing around with doing something similar but following Ryan’s Guide for WireGuard on Linode.
Self-hosted VPN with wireguard - Wikis & How-to Guides - Level1Techs Forums

But I am have problems with NAT. I ended up making firewall/Outbound NAT rules to direct some of my network segments directly out to the internet instead of through the WireGuard tunnel because I couldn’t get port forwards to work properly.

Also FWIW I have a public IPv4 address on my Starlink now, I’m not sure when it happened but I’m not get a 100.xxx address anymore.
Any ideas?

MadMatt · February 5, 2022, 12:25pm

It looks like it is not uncommon…

I am using openvpn exactly because I don’t want to set up port forward, or am I missing something?

I am not sure how it works with Wireguard, but with OpenVPN I set up a LAN to LAN connection between my home lan and the remote LAN that manages IP addresses for the OpenVPN clients … I don’t need NAT/port forward between the two sides …



     LAN                                 PFSENSE                    CLOUD
 ┌──────────────────┐         ┌──────────────────────────┐       ┌───────────────────────────────┐   Cloudserver Routes
 │                  └─────────►                          ◄───────┘                               │
 │  172.30.2.0/24   ◄─────────┐   172.30.2.1     WANIP   ┌───────► WANIP          10.128.8.0/24  │   default via 10.128.8.1 dev ens3
 │                  │         │                          │       │                               │   default via 10.128.8.1 dev ens3 proto dhcp src 10.128.8.3 metric
 │                  │         │                          │       │                               │   10.8.0.0/24 dev tun0 proto kernel scope link src 10.8.0.1
 └──────────────────┘         └──────────────────────────┘       │                Openvpn Clients│   10.128.8.0/24 dev ens3 proto kernel scope link src 10.128.8.3
                                                            ┌────►                               │   172.30.2.0/24 via 10.8.0.2 dev tun0
                                                            │    │                10.8.0.0/24    │
                                                            │    └──┬────────────────────────────┘   Route pushed to VPN clients
                                                            │       │
                                                            │       │                                iroute 172.30.2.0 255.255.255.0
                        ClIENT                              │       │
                        ┌───────────────────────────────┐   │       │
                        │                 WAN           │   │       │
                        │                 xxx.xxx.xx.xx │   │       │
                        │                               ├───┘       │
                        │                 Openvpn       │           │
                        │                               │◄──────────┘
                        │                 10.8.0.3      │
                        └───────────────────────────────┘

NicKF · February 5, 2022, 3:22pm

It may just be a routing issue? My topology isn’t really differant. Do I need to inject a route back on the Wireguard side or something?

MadMatt · February 5, 2022, 4:13pm

I think so …
if you have:

Client A
WAN IP xx.xx.xx.xx

Wireguard
WAN IP yy.yy.yy.yy
Local LAN (cloud/provider): e.g 10.200.0.0/24
Wireguard LAN: 172.30.30.0/24

Home
WAN IP: zz.zz.zz.zz
Local LAN: 192.168.100.0/24

The clients, unless you push a route, will know only about the Wireguard Subnet (if wireguard has the feature of allowing traffic between clients)

So client A will not have a route to Home-Local LAN unless you push it: