Hamming Codes

Part of Data Storage

Single-bit correction and double-bit detection through multiple overlapping parity checks — the elegant mathematical solution that made reliable memory practical.

Why This Matters

Richard Hamming developed his error-correcting code in 1950 out of frustration: the Bell Labs relay computer he used on weekends would encounter errors, and the only remedy was to halt and restart. He asked: can we add enough redundancy to not just detect errors but correct them automatically, without halting?

The answer was yes, and the result — Hamming codes — changed computing permanently. Every modern computer uses a form of Hamming code (specifically SECDED — Single Error Correcting, Double Error Detecting) in its main memory. Every ECC DIMM implements the Hamming principle. The insight that drove it is beautiful: instead of one parity bit covering all data, use multiple parity bits each covering a different overlapping subset of data bits. The combination of which parity bits fail tells you exactly which data bit is wrong.

For rebuilders, understanding Hamming codes means you can implement genuinely reliable memory systems and storage protocols. The code is simple enough to implement in discrete logic with a handful of XOR gates, and the mathematics is accessible with basic algebra.

The Core Insight: Locating Errors with Multiple Parity Bits

Simple parity uses one bit to tell you that something is wrong, but nothing about where. To locate an error, you need multiple parity bits covering different subsets of the data bits — a process of elimination.

Consider a simple example with 4 data bits (d1, d2, d3, d4) and 3 parity bits (p1, p2, p3):

  • p1 covers d1, d2, d4
  • p2 covers d1, d3, d4
  • p3 covers d2, d3, d4

If only p1 fails: the error is in a bit covered by p1 but not p2 or p3 — that is d2. If only p2 fails: error is in d3. If only p3 fails: error is in d1… wait, this doesn’t work with the example as stated.

The key is that the coverage pattern must uniquely identify each bit position. Hamming’s insight: assign power-of-2 positions to parity bits (positions 1, 2, 4, 8, 16, …) and distribute data bits in the remaining positions. Each parity bit covers all positions whose binary representation includes the corresponding bit.

Hamming(7,4): A Complete Example

Hamming(7,4) encodes 4 data bits into 7 codeword bits: positions 1 and 2 and 4 are parity bits, positions 3, 5, 6, 7 are data bits.

Parity bit coverage (using position numbers in binary):

  • p1 (position 1 = 001): covers all positions with bit 0 set: 1, 3, 5, 7 → covers p1, d1, d2, d4
  • p2 (position 2 = 010): covers all positions with bit 1 set: 2, 3, 6, 7 → covers p2, d1, d3, d4
  • p4 (position 4 = 100): covers all positions with bit 2 set: 4, 5, 6, 7 → covers p4, d2, d3, d4

Encoding: Given data bits d1=1, d2=0, d3=1, d4=1 (placed at positions 3, 5, 6, 7):

  • p1: XOR of positions 3, 5, 7 = XOR(1, 0, 1) = 0. So position 1 = 0.
  • p2: XOR of positions 3, 6, 7 = XOR(1, 1, 1) = 1. So position 2 = 1.
  • p4: XOR of positions 5, 6, 7 = XOR(0, 1, 1) = 0. So position 4 = 0.

Complete codeword: position 1=0, 2=1, 3=1, 4=0, 5=0, 6=1, 7=1 → 0110011

Error detection and correction: If a transmitted codeword arrives as 0110111 (bit at position 5 was flipped from 0 to 1):

  • Check p1 (positions 1, 3, 5, 7): XOR(0, 1, 1, 1) = 1 ≠ 0. p1 fails.
  • Check p2 (positions 2, 3, 6, 7): XOR(1, 1, 1, 1) = 0. p2 passes.
  • Check p4 (positions 4, 5, 6, 7): XOR(0, 1, 1, 1) = 1 ≠ 0. p4 fails.

The syndrome (binary number formed by failed parity bits): bit2=1 (p4), bit1=0 (p2), bit0=1 (p1) → syndrome = 101 binary = 5. The error is at position 5. Flip bit 5: 0110011. Correct!

The syndrome directly encodes the position of the error — this is the elegance of the Hamming construction.

Hamming(8,4) — SECDED: Detecting Double Errors

Hamming(7,4) corrects single errors but cannot distinguish a single corrected error from two simultaneous errors — it would silently “correct” to the wrong value. For memory systems, this is unacceptable.

The fix: Add a single overall parity bit (covering all positions). Now:

  • No error: all syndrome bits zero, overall parity passes
  • Single error: syndrome bits point to error position, overall parity fails (the flip changed parity)
  • Double error: some syndrome bits are nonzero, but overall parity passes (two flips cancel in parity)

From the overall parity result, you can distinguish: if syndrome ≠ 0 and overall parity fails → single error, correct it. If syndrome ≠ 0 and overall parity passes → double error, cannot correct, report error. This is SECDED (Single Error Correcting, Double Error Detecting).

Extended Hamming(72,64): The most common SECDED code for memory systems. 64 data bits (8 bytes) + 8 check bits = 72 bits. Used in ECC DIMMs. Can correct any single bit flip among all 72 bits and detect any two-bit error. Implementation requires 8 XOR trees, one for each check bit.

Hardware Implementation

Hamming encoding and checking is implemented as combinational logic — XOR gates — with no sequential elements (no clock, no registers). Data can be checked in a single propagation delay.

Encoder for Hamming(72,64): 8 XOR trees, one per check bit, each XORing a specific subset of the 64 data bits. The XOR trees can be implemented in a small PAL, CPLD, or FPGA, or in discrete 74xx logic gates.

A SECDED encoder for 64-bit words requires approximately 8 × 6 = 48 XOR gates (assuming each XOR gate has 2 inputs; for wider inputs, fewer gates are needed). A 74HC86 provides 4 XOR gates per package, so 12 packages — manageable as a PCB.

Syndrome computation: The syndrome checker XORs each check bit (as read from memory) with a freshly computed check bit from the data. If all syndrome bits are zero, no error. If the syndrome is nonzero, it encodes the bit position of the error. A ROM (lookup table) maps syndrome values to correction masks.

In modern RAM modules: ECC DIMMs include 9 chips instead of 8 — the ninth chip provides the extra 8 bits needed for SECDED over 64-bit words. The memory controller (in the CPU or chipset) implements the encoder on writes and the decoder/corrector on reads. ECC correction errors are logged to system management firmware (IPMI/BMC) for analysis.

Implementing Hamming Codes in Software

For tape recording, disk sectors, or file integrity, Hamming codes can be computed in software:

def hamming_encode(data_bits):
    """Encode data_bits (list of 0/1) as a Hamming codeword."""
    n_data = len(data_bits)
    # Calculate number of parity bits needed
    r = 0
    while (1 << r) < (n_data + r + 1):
        r += 1
 
    # Create codeword array (1-indexed for clarity)
    codeword = [0] * (n_data + r + 1)  # position 0 unused
 
    # Fill in data bits at non-power-of-2 positions
    data_idx = 0
    for pos in range(1, n_data + r + 1):
        if pos & (pos - 1) != 0:  # not a power of 2
            codeword[pos] = data_bits[data_idx]
            data_idx += 1
 
    # Calculate parity bits at power-of-2 positions
    for p in range(r):
        parity_pos = 1 << p
        parity = 0
        for pos in range(1, n_data + r + 1):
            if pos & parity_pos:
                parity ^= codeword[pos]
        codeword[parity_pos] = parity
 
    return codeword[1:]  # return positions 1..n
 
def hamming_check_and_correct(codeword):
    """Check and correct single-bit errors. Returns (corrected, error_position)."""
    n = len(codeword)
    syndrome = 0
    for p in range(int(n ** 0.5) + 2):  # check each parity bit
        parity_pos = 1 << p
        if parity_pos > n:
            break
        parity = 0
        for pos in range(1, n + 1):
            if pos & parity_pos and pos <= n:
                parity ^= codeword[pos - 1]
        if parity != 0:
            syndrome |= parity_pos
 
    if syndrome == 0:
        return codeword, 0  # no error
 
    # syndrome is the (1-indexed) position of the error
    result = list(codeword)
    if syndrome <= n:
        result[syndrome - 1] ^= 1  # correct the bit
    return result, syndrome

When to Use Hamming vs CRC

Hamming codes: Optimal for random (independent) bit errors. Memory chips that occasionally flip single bits due to alpha particles, power noise, or aging. Short blocks (8–64 bits). When you need automatic correction (not just detection). Low overhead — about log2(n) check bits for n data bits.

CRC codes: Optimal for burst errors (adjacent bits corrupted together). Storage media, serial communications, optical disc channels. Longer blocks (hundreds to thousands of bytes). When detection is sufficient (you can retry). Fixed overhead — 16–32 check bits regardless of data block size.

For a complete storage system, use both: Hamming SECDED at the memory level (correcting single-bit errors in RAM and during transfers) and CRC-32 at the file/block level (detecting any burst corruption on disk or tape). These two layers together provide robust protection against the most common error types.