Error Detection
Part of Networking
Techniques for detecting when data has been corrupted during transmission.
Why This Matters
Data transmitted across any physical medium can be corrupted. Electrical noise, signal attenuation, interference, and hardware faults all introduce errors — bit flips where a 0 becomes a 1 or vice versa. Without error detection, corrupted data arrives silently: the receiver has no way to know that what it received differs from what was sent. Applications built on unreliable data produce wrong results, and the wrongness may not be immediately obvious.
Error detection adds redundant information to each transmitted block of data. The redundant information is computed from the data before transmission. The receiver recomputes the same value from the received data and compares it to the transmitted value. If they match, the data is (almost certainly) correct. If they do not match, an error has occurred.
This simple principle underlies all reliable data communication. Understanding how error detection works, why different methods offer different levels of protection, and what the cost of that protection is allows you to choose appropriate methods for different situations. In a civilization-rebuilding context where hardware may be unreliable and environmental conditions harsh, understanding error detection is essential to building communications systems that you can trust.
Parity Bits
Parity is the simplest error detection method. A parity bit is added to each group of data bits such that the total number of 1 bits is either always even (even parity) or always odd (odd parity).
For even parity with 7 data bits: count the number of 1 bits. If the count is already even, the parity bit is 0. If the count is odd, the parity bit is 1. The receiver counts all 8 bits including the parity bit — if the count is even, no single-bit error occurred; if odd, exactly one bit was corrupted.
Parity detects all single-bit errors and odd numbers of errors in general. It does not detect even numbers of errors: if two bits flip, the parity remains correct despite the corruption. For this reason, parity is considered adequate only for environments where multi-bit errors are extremely rare.
Parity was common in early serial communications (the seventh bit in many ASCII implementations is a parity bit) and in RAM error detection. It is inexpensive — one bit of overhead per seven data bits — but its failure to detect even-count errors limits its usefulness as data rates and error rates increase.
Two-dimensional parity extends the concept by computing parity for both rows and columns of a data block. This detects all single-bit errors and can even correct them (because the row parity indicates which row has an error and the column parity indicates which column). It does not detect some two-bit errors, but provides significantly better coverage than simple parity at the cost of more overhead bits.
Checksums
A checksum is a value computed by treating the data as a sequence of numbers and performing arithmetic on them. The sum (or some transformation of the sum) is appended to the data. The receiver recomputes the checksum and compares it to the received value.
The simplest checksum is the arithmetic sum of all bytes, truncated to fit in one or two bytes. This catches many errors but has weaknesses: transposed bytes (where two adjacent bytes are swapped) may have the same sum as the original, and zeros can be inserted or deleted without changing the sum.
The Internet checksum used in IP, TCP, and UDP headers is a one’s complement sum of 16-bit words. The receiver adds all words including the checksum — the result should be all 1 bits (in one’s complement arithmetic) if no errors occurred. This is slightly better than simple arithmetic sum but still has the weakness of not detecting transposed words.
Checksums are fast to compute and adequate for many purposes. IP headers use a simple checksum because IP is designed to be fast and the upper-layer protocols (TCP) provide stronger error detection. For files stored on disk or transferred between systems where correctness is critical, stronger methods are preferred.
Cyclic Redundancy Check (CRC)
The Cyclic Redundancy Check (CRC) is the standard error detection method for Ethernet, storage systems, and most serious data communication protocols. It treats the data as a very large binary number and divides it by a fixed polynomial, keeping the remainder. This remainder (the CRC value) is appended to the data.
CRC is computed using polynomial division in modular arithmetic (specifically GF(2), the Galois field with two elements). The divisor polynomial is chosen to have specific mathematical properties that ensure detection of particular error patterns. Different polynomial choices provide different detection guarantees.
The CRC-32 polynomial used in Ethernet detects all single-bit errors, all burst errors shorter than 32 bits, and over 99.9999% of burst errors longer than 32 bits. A burst error is a contiguous sequence of bit corruptions — the kind most likely to result from electrical interference, which typically corrupts a cluster of adjacent bits rather than individual scattered bits.
Computing CRC in software requires either direct polynomial division (slow) or a lookup table approach (much faster). The lookup table for CRC-32 has 256 entries, each a 32-bit value. For each byte of data, you XOR the current CRC value with the data byte, use the result’s low 8 bits as a table index, and XOR the table entry with the shifted CRC. This produces the correct CRC with minimal computation.
In hardware, CRC is computed with shift registers and XOR gates, which can process data at full wire speed. Every Ethernet network interface computes and checks CRC in hardware, adding no latency to normal operation.
Practical Error Detection Strategies
Different layers of a communication system use different error detection methods, and they work together.
At the physical link layer, Ethernet uses CRC-32 to detect bit errors on each frame. If the CRC fails, the frame is discarded silently — Ethernet does not retransmit; that responsibility belongs to higher layers. The discarding of corrupted frames ensures that higher layers only see complete, correct frames.
At the transport layer, TCP includes a checksum covering the entire segment (header and data). This catches errors that slip through the link-layer CRC (which are rare but possible, especially in wireless systems or very noisy environments). The TCP checksum is weaker than CRC-32 — it will miss some two-byte transpositions — but provides an additional layer of verification.
For files, cryptographic hash functions (MD5, SHA-1, SHA-256) are commonly used as checksums. These are much stronger than CRC for detecting both accidental and deliberate modification, because they are designed to be collision-resistant. However, they are much slower to compute than CRC and are overkill for transient communication links.
Building Error Detection From Scratch
If you need to implement error detection without access to existing libraries, the following approaches scale from simplest to most robust.
For very simple applications with small data and low error rates, an 8-bit sum checksum is adequate: sum all data bytes modulo 256 and append the sum. At the receiver, sum all bytes including the checksum — the result should be zero (if you sum and then take the two’s complement before appending) or a known constant.
For more reliable detection, implement CRC-16 using a lookup table. The CCITT CRC-16 polynomial (0x1021) is widely used and its 256-entry table fits comfortably in memory. This detects all single and double bit errors in messages up to 32,767 bits and all burst errors up to 16 bits.
For maximum reliability with modest complexity, implement CRC-32 with the standard Ethernet polynomial (0x04C11DB7 in normal representation, 0xEDB88320 in reversed-bit representation used in most software implementations). The 256-entry, 32-bit lookup table is 1 KB of memory, and the algorithm processes one byte per table lookup and three XOR operations.
Always test your error detection implementation with known error cases. Introduce single-bit flips, multiple-bit corruptions, and truncated messages, and verify that your implementation detects each. A error detection implementation that has never been tested against actual errors may have subtle bugs that cause it to always report “no error” — a failure mode that is worse than having no error detection at all.