Debugging Strategies?

So I have been out of college for about a year. I find one of my biggest weaknesses that I would like to correct is trying to debug a project that uses unfamiliar libraries and one that you have not done much work in and when you run it or its tests you see errors that seem unfamiliar and you are not sure what they mean.

I have a few tricks I know how to try and break down this problem. Log statements, checking out earlier commits to see what changed, ect.

However what seems to happen is I get started with the wrong assumptions about the error message and I get stuck looking at the trees and not the forest. I have this idea that in the future I could walk away from the computer and try and write out the problem in a higher abstraction on a piece of paper. I have not tried this yet though.

I have a few ideas on how to get better at this skill but I was curious if any of you had come up with your own strategies on this sort of thing?

1 Like

My two cents has always been the ‘rubber duck’ method. Luckily, I work with some awesome/patient people and we frequently call one another to each other’s desks and literally just use them as a board to talk at.

I’ve found talking through the problem out loud (especially to another person) significantly speeds up the time it takes to find wherever the error be in whatever I’m writing at the time.

If there’s no one around you can always try being the eccentric programmer who talks aloud to themselves - I have done this several times to great effect hah

Also, this is a massive help as well. Spending 5 minutes to get away from the desk does help refresh my brain a little and get it back into gear.


This reminds me of when I had to have one of these
Old flowchart template
As a guy who drives a truck any help I could offer would probably be useless
We used to write flow charts


How I do it when I have time:

  1. Look at the plan (usually some badly sketched thing on paper with half the states missing)

  2. Look at what leads to the fail state following the plan

  3. Either correcting the plan or the programm to follow the plan

When I got someone else or my cat:

  1. Explain the programm.

When in a hurry:

  1. Narrow it down to a bad section

  2. Clean slate that section


These are all very good points, but the one and first process I go through is…

Where the hell did I miss the break???

I can’t tell you how many times I sometimes rush a project when I need other stuff done. But this was the majority of my failures when trying to compile and it complains to me that I’m a complete idiot.


Hah, I have always found computers are extremely good at doing exactly
what they are told. Nothing more, nothing less.

1 Like

All of the above have mentioned good techniques. One thing to also try to use is a good debugger so that you can step through the code. Basically come up with some test data and what should happen, then step through the code and check your answers. If there is a discrepancy, work backwards.

Also, if the language supports assertions, then assert some values and if they ado not match the assertion, the code will auto break in the debugger. If no debugger, then at run time the code will throw an exception and tell you that it broke at line blah due to assertion whatever not being met.

Debugging sucks.
Always has, always will.
Everything always changes.

Use code formatters, linters, enable compiler warnings, use thread and memory sanitizers, use decent vcs systems, use good build/test frameworks (that support reproducible distributed building/testing).

Beyond that, divide and conquer using some simple questions to classify the nature of the issue:

  1. Has this worked ever before
  2. Is this reliably reproducible or at least flaky reproducible (e.g. can you run the test suite multiple times e.g. bazel test :foo_test --runs_per_test=1000; this is really good for integration testing - the use case for an office full of 3990x/256G ram workstations that wendell was describing)
  3. Does this only happen in prod (e.g. not good enough test coverage or an environmental issue), can you setup a prod stack and replay logs to reproduce? Can you come up with a integration test that has a random-ish synthetic load generator (we shook out a bunch of bugs out of mariadb 10/mysql 5.6 innodb that way, including some ext4 bugs and some filesystem bugs… and also a ton of bugs that were flaky because tests themselves were buggy)

Once you can reproduce the issue, instrument (log statements text or binary), or checks that crash the binary with a core dump when running in this test environment and so on).

If you can’t reproduce the issue, either there’s no bug such as the one you’re looking for, or you’re not trying hard enough.