Mystery WAN Packet Loss, 6 months going.

I’ve been trying to troubleshoot this for about 6 months. Would love any insight or even guesses at what could be happening here as I’m at my wits end at this.

Here is the quick description of whats happening:
Randomly my internet drops out for a few seconds, rarely a minute long then it comes back. LAN is fine. This happens once a day, sometimes 2-3 times a day.

Here is the long part:
Watching the status page on Pfsense/Opnsense I can see that my WAN gateway starts slowly going up in packet loss percent. I think this is only affecting my upload. When on discord everyone says I’m starting to sound crunchy but everyone sounds fine to me. Same when playing multiplayer. For example in Arma3 I was told my voice was going crunchy, but in game everyone was walking around just fine no rubberbanding or the walking forever in one direction. Discord will disconnect me fairly quickly where as arma takes a while to boot me.

Details:
My ISP is GCI and it’s a cable modem set as a gateway. Been using them for years and they are highly reliable, going down maybe once or twice a year.
I’ve had techs check it out several times and it’s fine. It’s shown zero packet loss and very few uncorrectable in it’s own logs.

The hardware I’ve been using for 4 years now is a Jetway NUC PC with gigabit Intel NICs built in. Running PfSense. The switch I have is an unmanaged 1GB TP-link 24-port. Along with a Access Point.
In PfSense I don’t have anything setup like VLANs or VPNs just DHCP setup directing DNS requests to my PiHole running on an Raspbery Pi. UPnP NAT-PMP and some manually set port forwarding for my game servers.

Other things on the network is my server running TrueNAS, a few PCs, Switch, PS4, IP Cameras tethered to Zoneminder.

TrueNAS is running a handful of things, Samba shares, Plex server, SyncThing, Zoneminder, Mumble server, MineOS server.

Now here is what I’ve done:
All the coax cabled replaced and tested. Changed out all of the ethernet cables within my network. Everything is plugged into a UPS.
Replaced the modem and the new version is 2.5GB version. So with a 2.5 GB modem I also decided to update my router. I bought a HUNSN RS34g, another intel NUC but with 2.5GB Intel NICs and a TRENDnet 8-Port Unmanaged 2.5G Switch. Reinstalled Pfsense fresh and restored settings from the prevoius router.

I’m not sure when exactly this problem started but I think it was around PfSense version 2.6.0.
Problem is still present. At this point I’m going crazy. So I wipe the install clean with Opnsense.
Problem is still present.

Few more things:
I’ve messed with gateway settings, such as disabling gateway monitoring, changed the monitor IP to 1.1.1.1 to keep the ping off of the gateway.
Disabled IPv6 on both LAN and WAN. Tried using “Override MTU” “Dynamic gateway policy”. Also forced speed and duplex to both 2.5GB and 1GB.
Unplugged just about everything at one point and this still happens. Also I completely reinstalled my PiHole on a new SDcard as well.

In the Interface Statistics of Opnsense there isn’t any listed errors or collisions in LAN or WAN

The only place I can see an error is in the gateway logs, here is an example:

2022-09-27T16:14:43-08:00 Warning dpinger WAN_DHCP 66.77.123.1: Clear latency 9544us stddev 1926us loss 11%
2022-09-27T16:12:22-08:00 Warning dpinger WAN_DHCP 66.77.123.1: Alarm latency 10715us stddev 2377us loss 21%
2022-09-25T17:54:35-08:00 Warning dpinger send_interval 1000ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 66.77.123.1 bind_addr 66.77.123.75 identifier "WAN_DHCP "
2022-09-25T17:54:35-08:00 Warning dpinger exiting on signal 15

I get Clear and Alarm alternating quite a bit in the logs and the send_interval along with sginal 15 now and then. Never consistent. I changed my listed IP other than the last digit holder; 66.77.123.1 is my ISP and 66.77.123.75 is my web visible IP address.

When I used 1.1.1.1 as my monitor IP the logs don’t really change

2022-09-25T17:51:59-08:00 Warning dpinger WAN_DHCP 1.1.1.1: sendto error: 65
2022-09-25T17:51:48-08:00 Warning dpinger WAN_DHCP 1.1.1.1: Alarm latency 37534us stddev 1343us loss 22%

I know this is a lot of info for something that randomly happens whenever it feels like but I hope someone has some sort of idea of whats going on.
Last bit of detail, the only time it happens “more” is when a friend is watching something on my plex, but even then it’s not always going to happen, just slightly more.

Thanks for reading my maddening problem, any help is very welcome.

Ever tried running pingplotter against a few different addresses including something on their network?

I had a similar issue with Comcast, download was fine but when I uploaded anything meaningful I would get a ton of packet loss. In the end I found that everything Comcast related would drop no packets, but anything outside their network would, which lead me to believe it was something inside their network and nothing on the edge (Modem, my equipment etc)

It would get better and worse, I think they may have fixed it, but I then moved and got AT&T Fiber

If it is something on their network, good luck reporting it. Also try post this on the DSLReports forum

2 Likes

I also had the same issue with Comcast in the US, also switched to AT&T Fiber to “solve” the problem. It’s got to be an issue upstream with some common component of the coaxial-internet-to-the-home infrastructure.

What is your upload bandwidth and have you tried checking how much upload throughput you are pushing when experiencing the latency increases?
It may be just a phone deciding it needs to cloud sync images, or something along these lines …

I am worried that it something to do with the inner workings of GCI.

Thanks for letting me know about pingplotter, I’ll give that a go and see if that gives me any further information about whats going on.

I am slightly worried it is this and that the service overall is going down the hole.

It’s 2000Mbps down and 75 Mbps up. There was a few days where I was able to sit down almost all day and watch my internet throughput like a hawk.
Had everything unplugged, wifi, consoles ect only had my server and main rig running. Never hit any where near my speeds. Even with a friend watching plex and playing on a game server I was hosting.

As for sync stuff I don’t use any thing that isn’t local, pretty much just syncthing at this point.

edit: number not big enough

I assume you got those mixed and its 2Mb/s up? If so, jeezus! That’s the lowest upload I’ve ever seen. No wonder it goes to hell!

Whoops sorry, 2000Mbps down hence the need for a 2.5Gb modem and router

Gosh this sounds so errilly simular to my situation, but now having about 6+ months of stability under the belt.

I see that you already purchased a replacment modem, which I assume you already made sure it was fully-compliant with your ISP, so I unfortunatly dont think what fixed mine will work for yous, but just in case here are the two items I had to resolve.

|:issue_#1 - Compatible Coaxial Splitter:|
With the intermitten issues begining and doing the general troubleshooting things, the first bigger task I took on was to visually inspect and cleanup, if nessecary also replace, the coaxial infrustrucutre around the house. With all new lines, and problem persiting, I had comcast tech come out and he found that the splitter was actually designed for a sattlite coax network, such as DirecTV. Boy did I feel dumb, but afterwards, he was able to get an acceptable signal to the modem and so I was relived to finally have some stability back in my net-life…pftt stability+life…those words frankly repeal each other for me and my existance lol.

|:issue_#2 - Fully-Compliant Coaxial Modem:|
After many troubleshooting sessions and comcast telling me that I need a new modem and continued trying to sell me theirs and this resulted in why I kept thinking my own modem, that was purchased within the last year at that time, was fine and they are just being sales people. If they would of giving me maybe like any technical reasons as to why its no longer compatible with their network, that would have been more than enough to allow me to think its not just a sales thing.

So in the end I was definitly in the wrong and at some recent point around that time, Comcast had done modifications to their own network and happened to have rendered the modem I owned to no longer be fully-compliant, causing the very intermittent outages again.

I haven’t had this issue in a while.

There’s a command line tool called mtr, it’s like ping plotters elder cousin.

It’ll run on a variety of platforms and can record output to a text file, alongside some timestamps for later analysis.

Run one instance towards e.g. Google, and another towards e.g. Amazon, and do e.g. interval .1 (10 probes a second each), to each of the hops.

After a few days, you’ll see of there’s any significant packet loss or latency excursions between whatever box you’re running it on, and these two major cloud providers.

(Yes, that’s about 200 packets / statistics data points a second, don’t write the output to an SD card)


As for actually solving the issue, if it turns out it’s not on your end:

There are ways of bonding connections across multiple flaky providers: https://www.openmptcprouter.com/ . Something like a latency spike or a packet drop won’t be entirely seamless, but if you need reliability, this is an easy/cheap way to get it. (there’s other alternatives as well).

1 Like

Oof, those you got there, are some speedy nets, now to be able to utilize it to its full potent.

I wanted to add one more thought in addition to what risk mentioned and I’m betting on you’ve already gone down this road as well, but if its something applicable for you to do and for an extended amount of time, I would isolate it down to just the modem?

For example, a laptop plugged into the modem via ethernet and that’s the entire topology to start running some of the test’s risk and others relayed before.

Are you able to give us the exact product model of the modem purchased? And a question on my part here - whenever you did the modem swap, is it a similar process to someone like comcast, in that you must authorize the new modem MAC with the ISP by either a self-install-method or contacting support? Or is this a more automated thing where you just plug-n-play any modem now n days?