As the title says… Here’s some more network topology info and troubleshooting steps already taken:
I have registered a public domain (leal.app.br) and also registered it to use Cloudflare’s DNS as the DNS server, so I can already ping it from anywhere (response comes from a Cloudflare IP, as expected).
Network Topology
- ISP Modem/Router: (accesses external Internet)
- IP: 192.168.17.1
- Acts as DNS for 192.168.17.x, forwards to ISP DNS
- Main Router (TP-Link Deco XE75 Pro) - 1st floor:
- WAN IP: 192.168.17.2
- LAN IP: 10.0.0.1
- Subnet: 255.255.252.0 (10.0.0.0/22) - valid IP range from 10.0.0.2 to 10.0.3.254 (10.0.0.1 is the gateway itself and 10.0.3.255 is the broadcast address for this network).
- DHCP for all 10.0.0.0/22 devices
- 2.5G Switch (managed but also dumb, bought on Aliexpress, “unknown” brand) - 2nd floor:
- IP: 10.0.0.12
*Connects main router, homelab server, main PC, and 2nd Deco - Homelab Server (Proxmox) - 2nd floor:
- Proxmox: 10.0.0.10
- Ubuntu Server VM: 10.0.0.100 (runs Docker, Portainer, Nginx Proxy Manager, etc.)
- Main PC - 2nd floor:
- IP: 10.0.2.1
- Downstairs Windows PC:
- IP: 10.0.1.1
- 2nd Deco XE75 Pro (WiFi AP) - 2nd floor:
- IP: 10.0.3.250
- Dockerized Services on Ubuntu Server (network type, internal IP):
- dnsmasq (macvlan, IP: 10.0.0.224)
- Nginx Proxy Manager (bridge, 10.0.0.100:81)
- Portainer (10.0.0.100:9443)
Goal
Set up Split-Horizon DNS so that internal hostnames (e.g., pve.leal.app.br) resolve to internal IPs (e.g., 10.0.0.10) for devices on the LAN, using a Dockerized DNS server (dnsmasq, possibly Unbound or BIND9 if needed). But can also act as a simple DNS Forwarder (to my ISP modem) and to also make sure that the external domain names that are made accessible externally by me (let’s say Plex or Jellyfin, in the future, that points to the internal plex/jellyfin server but also accessible externally from my house as in something like jellyfin.leal.app.br)
What Works
* All devices have internet access and can ping each other.
Docker containers are running and accessible via their mapped ports.
* I can ping the dnsmasq container’s IP (10.0.0.224) from my Windows PC (10.0.2.1).
What Doesn’t Work
- When I set my Windows PC’s DNS to 10.0.0.224, all DNS queries time out and I lose internet access.
- dnsmasq logs show queries arriving from my PC, but no replies are received by the client.
Troubleshooting Steps Taken
- Verified Container and Network:
- dnsmasq is running in Docker (macvlan, static IP is 10.0.0.224, subnet mask set to 10.0.0.0/22 - same as 255.255.252.0).
- Container is started via Portainer.
- Confirmed with docker inspect that the container is on the correct macvlan network.
- Checked Listening Ports:
- Inside the container, netstat -tulnp shows dnsmasq listening on 0.0.0.0:53 (UDP/TCP).
- On the host, no process is listening on port 53 (as expected with macvlan).
- Tested Network Connectivity:
- Can ping 10.0.0.224 from Windows PC.
- tcpdump inside the container shows DNS queries arriving from the PC.
- Checked dnsmasq Logs:
- Logs show queries from the PC (e.g., query[A] google.com from 10.0.2.1).
- No reply lines or errors about upstream DNS.
- Tested Upstream DNS:
- Configured dnsmasq to use only 1.1.1.1 as upstream (removed my local ISP modem as a DNS).
- No change in behavior.
- Tested DNS from Inside the Container:
- nslookup google.com inside the container works (using 127.0.0.1 or container’s own IP - 10.0.0.224 or 10.0.0.100 if network mode set to host).
- Tried Host Network Mode:
- Ran dnsmasq in host network mode; still no replies to external clients. (10.0.0.100 is the Ubuntu Server host IP)
- Checked iptables/nftables:
- UFW is inactive.
- No obvious rules blocking UDP 53.
- Checked AppArmor:
- No denials or restrictions found for dnsmasq.
- Tested with Local Records:
- Attempted to resolve local records (e.g., leal.app.br); same issue—queries logged, no replies.
- Confirmed No Port Conflicts:
- Nginx Proxy Manager and Portainer are on different ports and networks.
- Switch/Router:
- No client isolation or port isolation enabled.
- All devices are on the same subnet and can communicate.
Suspicions
- Docker macvlan quirk: Host and containers on macvlan sometimes have reply path issues, but even with host networking, replies don’t reach clients.
- dnsmasq-specific issue: Considering trying Unbound or BIND9 as an alternative.
• Possible Docker/Portainer misconfiguration: But all other services work as expected.
What I Want to Achieve
- Internal DNS resolution for custom domains (e.g., pve.leal.app.br → 10.0.0.10). (Split Horizon DNS)
- Ideally, all LAN clients use the Dockerized DNS server for both local and external DNS.
- No disruption to existing network/internet access.
Questions for the Community
- Has anyone seen this behavior with Dockerized dnsmasq on macvlan or host networking?
- Is there a known issue with reply packets not being routed back to clients in this setup?
- Would switching to Unbound or BIND9 likely resolve this, or is there a deeper Docker networking issue at play?
- Any tips for further debugging or a recommended configuration for Split-Horizon DNS in a home lab like this?
Thanks in advance for any help!
If more config files, logs, or network diagrams are needed, let me know.
P.S.: This is a summarized transcription of a ChatGPT conversation made via Abacus AI, I’ve asked it nicely to format all the important data as a forum post.