ISP Network Performance Monitoring

Background

I have intermittent network performance problems on my ISP’s side.

  • Dropped UDP (no response from ISP and nonISP DNS)
  • TCP timeouts
  • Very high latency (~200ms average) from 8.8.8.8.
  • Full outages of up to an hour but usually less than 5min

I’ve started monitoring network performance partly to figure out what’s going on and partly to be able to tell my housemate it’s not anything I’m doing that is causing the problems.

I have written some simple scripts to accomplish this.

The Ask

Now it’s becoming a hobby and I’m wondering

  1. What do professional tools do (rather than what exact software is used) to monitor ISPs?
  2. What do you do to monitor your ISP?

Note: I fat fingered the create button before I was done so I deleted the post since I was going to save it as a draft and come back to it. My apologies if posting a new topic isn’t the right course of action. Deleting the post doesn’t appear to have had the effect I expected given it has 42 views.

I’ve used @geerlingguy’s Internet Pi Project extensively and I think it’s amazing because it combines multiple ways of monitoring your connection with a nice Dashboard.

See:
https://www.jeffgeerling.com/blog/2021/monitor-your-internet-raspberry-pi

1 Like

Thanks. That looks like a good resource. I forgot to mention I’m also running the speedtest-exporter for prometheus but I don’t have alerts working and I don’t keep an eye on it so I’m effectively not using it right now.

Netdata (GitHub - netdata/netdata: Real-time performance monitoring, done right! https://www.netdata.cloud) does a pretty good job, is very easy to setup and is packaged in most distros (netdata package versions - Repology). There’s a fping plugin (fping.plugin | Learn Netdata) if you want to monitor such things for example.

Rant: One thing many Linux users seem to love to do is to “break” the distro’s packaging framework/system by installing a bunch of software by hand or some other odd solution when it’s most likely already packaged. If it isn’t I highly recommend submitting a request/patch for it. I can see why maintaining quickly gets painful by not using the package framework/system.

I didn’t see a python-ping or pip package in opnsense and I don’t intend to “maintain” a script like that.

Smokeping is a classic tool for measuring latency (it lives in many linux package managers too). You can add multiple hosts, like router, ISP DNS, and google/cloudflare DNS. This helps identify where the latency may be coming from.

I’ll be honest though, this sounds lke it may either be bufferbloat or a dodgy layer 1 connection to your ISP.

A few questions:

  1. What kind of connection do you have to your ISP (there are many different types, so give some details).
  2. Does the high latency and packet loss occur only during heavy usage periods? What if you plug directly into the router and disable WiFi?

You’ll really want to measure network usage as well as latency. What kind of router are you using, if it has SNMP you can most likely monitor download usage over time using something like LibreNMS (Or Zabbix, or PRTG).

Just to give a description, buffer bloat occurs when your connection is too busy and the queue gets full (can’t push packets fast enough). This can cause TCP to get confused and start sending more traffic, making the buffer more full, leading to packet loss. The best solution is to deploy fq_codel and cake, pfSense and OpenWrt can do this. Put simply, you configure your bandwidth and they’ll drop packets when the buffer gets too full. Packet loss will keep TCP in line and not bloat your buffers.

The simplest bufferbloat test I’m aware of is hosted by DSL Reports. Scratch that appears to no longer work, maybe try this one.

However the better test I’ve found is to ping out to a VPS/VM in the cloud, and ping in from the VPS/VM. Then saturating upload and download using an artificial test (netcat is usually enough).

Fort the purposes of this thread I’m more interested in how and what things are measured by other people in general rather than how to fix my connection.

For the curious:

  1. I have an opnsense router (old athalon, minimum requirements, just doing basic fw, routing, and dhcp) in the DMZ of the ISP router. The ISP router is connected the ONT that gets fiber from the street. Everything from the street to the opnsense router is IPv4 DHCP. I leave the ISP router in place to keep their support staff happy (calls end quick if it’s not the machine connected to the ONT). It’s doing next to nothing so it isn’t likely to be overtaxed. Also I have 50/50 so no hardware should be getting pushed to the limit. I have seen all of the problems even with a windows client connected directly to the ISP router. Both the ONT and ISP router have been replaced at least once.
  2. It happens randomly as far as I can tell. Sometimes It’s even durring very off peak usage. I suspect the longer outages were maintenance related since they happened overnight. I’m discounting anything happening in high wind since we have aerial fiber and it can get knocked around quite a bit.
    If I’m not running netdata the opnsense router RAM doesn’t get above 50% as far as I can see.
    I’m almost certain it’s the ISP being wonky. I can’t do much about it since it’s intermittent and at best their response is to replace hardware that doesn’t change anything. If I can gather enough data maybe I’ll find a pattern and I can get them to catch the problem in action.

Hmm?

I found this How to do Packet Loss And Latency Monitoring in pfsense - YouTube
It looks like this is the clean version of my solution. Monitoring an IP in the ISP’s infrastructure sounds like a good idea too.