Return to Level1Techs.com

Comprehensive debuging of network issues


#1

Is there a diagnostic tool for identifying exactly where (which hardware or software) a network issue is originating? Or at least get some decent clues?

I have a situation where network connectivity randomly gets interrupted for a few minutes then comes back on its own.

  • Happens with multiple devices, though not necessarily at the same time (confirmed by checking other devices, eg. wired VOIP ATA, still have connection)
  • Happens over wifi, but the wifi doesn’t drop (confirmed by client uptime on wifi access point)
  • Usually can force connection to come back by disconnecting and reconnecting client to network.
  • Seems to happen with different OSes, but I don’t have much detail on this.

Network setup:

client <—(wifi)—> wireless access point (ddwrt) <—(powerline adapters)—> router (ddwrt) <—(ethernet)—> cable modem

I have some suspicion of the powerline adapters, but I think I still experienced the issue when connected directly to the router’s wifi.


#2

No such tool I can think of, every time I had an issue like that at home I had to come up with some kind of logging thing myself, and that always helped (e.g. running mtr --raw and letting the file build up, or increasing verbosity of DHCP logs or wifi logs and then having a look days later.


#3

And how is mtr's output interpreted?

This network issue has been around for years and it bugs my optimizing mind, but most don’t bother looking for solutions since the workaround of disconnecting/reconnecting is so easy.

On the other hand, just trying to describe the issue in a useful way is difficult.


#4

The best diagnostic tool is your brain and proper process.

General process for debugging network issues is either top down or bottom up (in terms of the protocol stack) or an intelligent guess and work from there.

Given you’re using a mix of wifi and power line adapters which are layer 2 devices and both susceptible to interference i’d be starting at layer 1 or layer 2 with those devices.

i.e.,

  • are the cables all good (layer 1)
  • are there any devices that use the same media that may be causing interference (i.e. devices on 2.4 or 5ghz radio frequency or say large devices on the power circuit which may be causing a drop in output or noise on the circuit
  • if there are such devices, attempt to isolate them (i.e., remove/turn off/disable/etc.) and see if the problem persists.

Keep a log (can not stress this enough!) of exact date/time of your drop outs and also the date/time of changes you make, and see if you can establish a pattern that may coincide with something else going on in your house.

Biggest thing with network troubleshooting in my experience is figuring out how to properly rule something out as being a factor. I.e., you need to test and confirm behaviour.

If you can properly rule something out as being a factor (i.e., devise a reliable test for that) then you can cross that off the list and move onto something else.

There are tools which may help you perform tests and snoop traffic, etc. But they won’t magically debug this for you. They will however help you test theories and observe what is going on in more detail.

Bear in mind with WIFI and power line stuff, it may be something entirely unrelated to your network. A classic case of intermittent issues I was having on a mine site for work was due to heavy machinery moving around sea containers and blocking line of sight periodically on a point to point wifi link.


#5

Ok, so it doesn’t look like there’s a magical network debugging tool. Unfortunately my knowledge of networking is limited, which makes the ‘brain’ approach more difficult. Maybe other brains here can help.

Some clues about the issue:

  • intermittent drop (could be a few times in a day or no observed problems for a week) of network communication for a given client device, i.e. browser can’t load web pages and pinging even local computers fails
  • but during a drop, the home phone (VOIP) still works (connected by ethernet to router), therefore it’s probably not a (complete) failure of the ISP or modem or router
  • during a drop, the client’s wifi connection is not dropped, therefore either the network manager and wireless access point’s client uptime stats are lying or it’s not necessarily a problem with the wifi connection

I wondered if it might be a DHCP issue, but I dunno… my client devices have static local IPs. For example, maybe a clash of DHCP servers on the router and wireless access point (whose DHCP server is disabled in the web admin interface).

The router was upgraded, and the issue existed before and after this change. I suppose it could still be a configuration issue.

It could be an issue with the powerline adapters, but the issue was experienced even when connected directly to the router by wifi. Could the powerline adapters still cause a problem in that case?

Could it be due to the wireless access point, whether its hardware or software (ddwrt) configuration?


#6

Doubt it is a DHCP issue (i am assuming you are only using ONE dhcp server and ONE subnet, and not doing something stupid like running a second router piggybacked off another WIFI router with 2 instances of DHCP and 2 subnets - i’ve seen people do that before. APs and routers are different things…). A machine will generally hold its DHCP lease until it expires, and periodically refresh it prior to expiry.

I’d suggest interference. Interference won’t necessarily result in wifi dropping entirely, but traffic on it might not reach the destination. Repeat: your wireless network can be not doing traffic, but still not “drop” the SSID.

The VOIP phone, being directly plugged into the router isn’t WIFI. If DHCP was an issue it would also affect any hard-wired devices that don’t have static IPs (like, i’d suggest - your phone).

So i think you can isolate it to the WIFI or possibly (if your AP is connected via power line) the power line network.

I’d also suggest drawing yourself out a network diagram and identifying the path that dropping-out devices are taking to the internet. Then investigate the devices on that path, and attempt to verify whether or not each segment or step in the path is also dropping out.

If you can, you could isolate the powerline adapters by plugging a wifi AFP directly into your router. Sure, signal might not be the best elsewhere in the house, but you can at least monitor for drop outs that may be getting caused by the powerline segment. No more drop outs = try running without wifi, but on the end of a power line adapter directly via cable.


#7

If you have the time when the connection drops, see what devices you can ping on the network, and WAN. As simple as a ping is, it can verify that a lot of things are or are not working.

This may be things you already know; I apologize if they are. This is my troubleshooting methods when I lose network connectivity.

Ping 127.0.0.1. This will ping your local machine. Confirms that IPV4 is working.

Next, ping your router’s IP address. This confirms connectivity to the router. If this works, you know physical connection to the router, and all hardware inbetween is doing its job. In addition, this confirms DHCP is working for the machine.

Next ping something on the WAN. 8.8.8.8 is a good choice. This confirms your modem has connectivity to your ISP, and that your ISPs connection to google is working also.

Lastly, ping www.google.com. this confirms the same as the last step but also confirms DNS is working.

Hopefully this is helpful.


#8

Settings as described here, but in summary, IP of wireless access point (WAP) is 192.168.2.2 and DHCP server is disabled (router is 192.168.2.1).

I was afraid this might be a possibility… harder to debug.

Is this necessarily true? Eg. if the WAP somehow messed up DHCP (or didn’t turn it off) would it also affect other devices not connected to it? The home phone’s ATA is wired to the router directly, while most other devices are connected to the AP either by wire (printer, one computer) or wifi (several computers, smartphones, ps3, tablet).

I tried doing this in the first post. One thing I’m not sure about: what is the path a client of the WAP would take when pinging another client on the WAP? Would it go directly (i.e. device A -> WAP -> device B) or does it always need to go through the router (device A -> WAP -> powerline -> router -> powerline -> WAP -> device B)? If it’s the latter, it would be difficult to isolate segments.

Stopping wifi/network is a good way to get murdered lol I do have a 100’ ethernet cable I could use to connect the WAP and router without the powerline adapters, but the path of the wire would annoy people who don’t care as much about debugging.

Note that the router also has wifi, but it’s literally at the opposite, most distant corner of the building from my computer, yet my computer’s wifi often prefers to connect to the router than the WAP which has a much stronger signal to my location (both use same SSID). The same sometimes happens with a computer in the same room as the WAP…

I didn’t try pinging 127.0.0.1 (next time) but anything else I tried times out (WAP, router, other computers, google/websites, etc)