Debugging
Part of Programming Fundamentals
Debugging is the systematic process of finding and removing the causes of incorrect program behavior.
Why This Matters
Every program has bugs. A programmer who cannot debug effectively is permanently stuck — able to write code but unable to make it work reliably. Debugging skill is what separates working software from perpetual prototypes.
For a rebuilding civilization with primitive tooling, debugging is harder than on modern systems. You may have no interactive debugger, no backtraces, no memory sanitizer, no logging framework. You have the program’s observable behavior, a way to print bytes to a console or display, and your understanding of how the machine works. This is actually sufficient — most bugs are found through careful reasoning and strategic observation, not sophisticated tools.
The programmers who built the first operating systems and applications did so with nothing more than this. Their debugging methodology — which is really just disciplined scientific thinking applied to programs — remains the most important skill in a programmer’s toolkit.
The Debugging Mindset
Debugging is hypothesis-driven investigation. The program is misbehaving in a specific, observable way. You form a hypothesis about the cause, design an observation that would distinguish your hypothesis from alternatives, make the observation, and refine your hypothesis.
The biggest mistake is changing code randomly in hopes that something will fix the problem. This approach occasionally succeeds by accident, but it more often masks the real bug, introduces new bugs, and leaves you with no understanding of what went wrong. When the same category of bug appears again — and it will — you are no better equipped to find it.
The second biggest mistake is investigating widely before narrowing the possibilities. Effective debugging is about elimination: each observation should rule out a large class of possible causes, steadily narrowing the field until only one explanation remains.
Reproducing the Bug
A bug you cannot reliably reproduce is very hard to fix. Your first goal is a minimal, reliable reproduction: the simplest possible input or sequence of actions that consistently triggers the incorrect behavior.
Start with the failing case and simplify it. Remove inputs, shorten data, eliminate optional features. Each element you remove while the bug persists narrows the cause. When removing something makes the bug disappear, that thing is relevant.
For timing-dependent bugs — those that appear only sometimes — look for patterns: does the bug appear more often when the system is busy? After a specific operation? When memory is nearly full? Patterns narrow the hypothesis space even when you cannot get consistent reproduction.
Document the exact reproduction steps. “The program crashes” is useless. “The program crashes when I enter the string ‘ABC123’ in the name field and press Enter while the status indicator is red” is actionable.
Reading Error Symptoms
The observable symptoms of a bug tell you where to look:
Wrong output value: Some calculation is wrong. Narrow down which calculation: print intermediate values. Find the last point where the data was correct and the first point where it was wrong. The bug is between them.
Crash / halt: The CPU is executing an illegal instruction, or has jumped to an invalid address. Common causes: array out-of-bounds writing past a return address, stack overflow corrupting control flow, a null pointer dereference. Examine memory at the point of crash.
Infinite loop: The loop’s exit condition is never met. Print the loop variable on each iteration — either it is not changing (body is not executing the update) or the condition is wrong (never becomes true for the actual values).
Random/variable behavior: Uninitialized memory, uninitialized variable, interrupt that modifies shared state at unpredictable times. Try running the program twice with identical input — different results confirm non-determinism.
Works on one machine, fails on another: Relies on a specific memory layout, specific initial memory contents, specific timing, or architecture-specific behavior. The difference between machines is your clue.
Adding Diagnostic Output
The most universally applicable debugging technique is adding print statements (or their equivalent — writing bytes to a serial port, toggling an LED, updating a display register) to make the program’s internal state visible.
Effective diagnostic output:
- Print values in hex, not just decimal. Binary patterns reveal things decimal does not.
- Print both the expected value and the actual value.
"COUNTER EXPECTED: 10 ACTUAL: 12"is more useful than"COUNTER: 12". - Print at every significant state transition: entering a function, exiting a loop, completing a calculation.
- Print variable values immediately before and after any operation you suspect.
The process: add output, run the program, look for the first output where the value is wrong. The bug is in the code between the last correct output and the first wrong output.
Remove or disable diagnostic output before deploying. Leaving it in clutters output and slows execution. Some programmers use a DEBUG flag to enable/disable all diagnostic output with a single change.
The Binary Search Principle
When you do not know which part of the program contains the bug, use binary search on the code: add a print statement in the middle of the suspect section. If the output looks correct, the bug is in the second half; otherwise, the bug is in the first half. Repeat, halving the search space each time.
For a program with 100 functions, this finds the buggy function in about 7 observations (log₂(100)). Linear inspection — checking each function in sequence — might require 99.
Memory Dump Analysis
On systems without high-level debugging tools, examining raw memory is often necessary. A memory dump prints the contents of a memory region in hexadecimal, typically 16 bytes per line with the starting address.
1000: 48 65 6C 6C 6F 20 57 6F 72 6C 64 0D 0A 00 00 00
1010: 01 00 03 FF 00 00 00 00 00 00 00 00 00 00 00 00
Interpret this by knowing your data layout. If address 0x1000 should be a null-terminated string: 48 65 6C 6C 6F... is “Hello World\r\n” in ASCII — looks correct. If address 0x1010 should be a 16-bit integer field: 01 00 in little-endian is the value 1 — is that correct for the current program state?
When debugging a crash, dump memory around the stack pointer and around the address the CPU was trying to execute when it failed. Corrupted data or a wrong address there explains most crashes.
Systematic Techniques for Embedded Systems
Without a monitor or display, embedded systems debugging uses whatever output channel is available:
Serial output: Send diagnostic bytes to a serial port connected to another machine. Slow but completely general — any value can be transmitted.
LED codes: Blink an LED N times to indicate a code. Tedious but requires no other hardware. Sufficient for distinguishing “reached checkpoint A” from “did not reach checkpoint A.”
Bus state outputs: On systems with a parallel output port or an extra output pin, toggle pins at known program points. An oscilloscope or logic analyzer can trace the sequence.
Checksum verification: After loading a program or data, compute and display a checksum. If it matches the expected value, the data was loaded correctly; if not, the loading mechanism is faulty.
Preventive Practices
Good debugging technique starts before the bug appears:
Write small functions with clear, testable behavior. A function that does one thing and has a clear contract (what inputs it expects, what output it produces) is easy to test in isolation.
Add assertions — checks that a condition you believe to be true actually is true — at function entry and exit points. An assertion that fails immediately tells you where your assumption was wrong.
Test incrementally. Add a small section of code, test it, confirm it works, then add more. Debugging a small addition is much easier than debugging a large one.
Keep track of what you have changed. When a working program suddenly breaks, the bug is almost certainly in the most recent change. If you use version control, you can compare the current code to the last working version and narrow the problem to the diff.
Practical Notes for Rebuilders
Establish a culture of careful testing, not heroic debugging. A discipline of writing testable code, testing each piece as it is written, and documenting what was tested produces systems that work reliably. Debugging theater — long sessions fixing problems that testing would have prevented — is wasteful and demoralizing.
Document bugs you find and fix. A brief record of the symptom, the root cause, and the fix is invaluable when a similar bug appears in different code. Patterns recur; experience finding them once makes finding them again fast.
The programmer who cannot debug effectively can only write code that happens to work. The programmer who can debug can write code that reliably works — a far more valuable capability.