Hi Level1 forums,
I have a “forbidden router” setup and I’m having an issue with OPNsense and pfsense (I thought changing system would resolve it but I’ve got the same problem). About every 30-90 minutes I lose connectivity and the gateway stops resolving. I’ll break up the post into sections for readability.
Virtualized OPNsense on a QNAP-1602P. Two NICs, one is an Intel X553 passed through with SR-IOV and QAT (WAN) and the other is a virtual bridge to the main switch with virtio drivers (LAN). The WAN connection is joined to an Arris/NBN HFC box which is in some kind of bridge mode (don’t know, ISP controlled firmware). I’m likely sticking with OPNsense because it gets better raw throughput for some reason I haven’t figured out; pfsense was 850mbps down and OPNsense is 920mbps down (max is 931mbps down because Australian internet is trash grade). Config is close to base configuration, the WAN is just DHCP with your standard 1500MTU. IPv6 is on and working via DHCPv6 and my ISP’s configuration for OPNsense has been followed to the letter.
Random dropouts for all clients; linux, windows, IoT, everything. It seems to be that DNS stops responding or similar but I’m not sure what’s gone wrong and can’t figure it out because I don’t have a good grasp on debugging these software routers. I switched over to a Linux VM forwarding traffic on another system and that worked fine, no drops but it’s a less than ideal solution and was a lot slower at routing because it was just a hacked together Ubuntu VM with no fancy NICs (that is, a realtek USB appendage that throttles bandwidth).
Using a simple linux based router fixed it. Rebooting the HFC router fixes it. Rebooting pfsense or OPNsense fixes it. Rebooting just the WAN in OPNsense fixes it. The problem is transient and sometimes it resolves itself too but it is still annoying, so annoying I’ve almost made a HomeAssistant function to reboot the vm… Update: I’ve also disabled the “Disable Gateway Monitoring” checkbox which did not fix it on pfsense but may fix it on OPNsense. We shall see. Update 2: The issue still occurs.
A better fix
What might be the actual cause of the problem here? Where should I look? Do you, brainstrust, need more configuration details or anonymised logs? Can anyone help me troubleshoot this?