Assembly Language

Assembly language is the thin human-readable layer directly above machine code, giving programmers direct control over hardware with minimal abstraction.

Why This Matters

Assembly language is the lowest level at which humans can write programs without manually encoding binary. Every instruction maps directly to one CPU operation — no compiler, no runtime, no hidden layers. Understanding assembly means understanding what a computer actually does, instruction by instruction.

In a civilization-rebuilding context, assembly language may be the only programming tool available when reconstructing computing from scratch. High-level languages require compilers, which require working computers to run. A programmer who understands assembly can write software for any machine with nothing more than the processor’s instruction reference and a paper notebook. The first programs for every historical computer were written in assembly or machine code.

Assembly also builds the mental model that makes all higher-level programming comprehensible. When a C programmer understands that a + b compiles to a LOAD, ADD, STORE sequence, they write better code. Assembly is the bridge between hardware and thought.

Structure of an Assembly Program

Assembly source code consists of lines, each specifying one CPU operation. A typical line has four fields:

LABEL:    MNEMONIC    OPERANDS    ; comment

The label (optional) marks the current address for use as a jump target or data reference. The mnemonic is the human-readable name for the instruction (MOV, ADD, JMP). Operands specify registers, memory addresses, or immediate values. Comments document intent.

A minimal program on a hypothetical 8-bit CPU:

START:  LDA  #10      ; load immediate value 10 into accumulator
        LDB  #7       ; load immediate value 7 into register B
        ADD  B        ; add B to accumulator (A = A + B = 17)
        STA  RESULT   ; store accumulator to memory address RESULT
        HLT           ; halt execution
 
RESULT: DB   0        ; reserve 1 byte, initialized to 0

The assembler converts mnemonics to binary opcodes, resolves label addresses, and outputs machine code ready to load into memory.

Registers and Addressing Modes

Registers are the CPU’s internal working storage — faster than memory, but few in number. A minimal CPU might have:

  • Accumulator (A): primary arithmetic register
  • Index register (X or Y): used for pointer arithmetic and array indexing
  • Stack pointer (SP): points to the top of the call stack
  • Program counter (PC): address of the next instruction to execute
  • Status register (SR): holds condition flags (zero, carry, negative, overflow)

Addressing modes determine how operands are interpreted:

  • Immediate: LDA #42 — the value 42 is encoded directly in the instruction
  • Direct/Absolute: LDA $1000 — load from memory address 0x1000
  • Register: ADD B — operand is register B
  • Indirect: LDA (PTR) — PTR contains the address to load from (pointer dereference)
  • Indexed: LDA $1000,X — load from address 0x1000 + X (array element access)
  • Relative: BNE LOOP — branch to LOOP, encoded as signed offset from current PC

Understanding addressing modes is essential for writing efficient assembly. Immediate mode is fastest (no memory access). Indirect and indexed modes enable dynamic data structures and arrays.

Common Instruction Patterns

Counting loop:

        LDA  #0       ; counter = 0
LOOP:   ADD  #1       ; counter += 1
        CMP  #10      ; compare with 10
        BNE  LOOP     ; if not equal, repeat

Memory copy (N bytes from SRC to DST):

        LDX  #0       ; index = 0
COPY:   LDA  SRC,X    ; load byte from source[X]
        STA  DST,X    ; store to dest[X]
        INX           ; X++
        CPX  #N       ; compare with count
        BNE  COPY     ; repeat until done

Subroutine call and return:

        JSR  MYSUB    ; push return address, jump to MYSUB
        ...           ; execution continues here after RTS
 
MYSUB:  LDA  #42      ; subroutine body
        RTS           ; pop return address, jump back

The stack is crucial for subroutines. JSR pushes the return address; RTS pops it. Local variables can be pushed onto the stack and popped on return, enabling recursive procedures.

Writing an Assembler

An assembler is a simple program (or even a manual process) that translates assembly text to machine code. A two-pass assembler works as follows:

Pass 1: Read all lines, assign addresses to labels, build a symbol table:

START = 0x0000
RESULT = 0x000A
LOOP = 0x0004

Pass 2: Re-read each line, look up label addresses, encode each instruction:

  • Look up the mnemonic’s opcode in a table
  • Encode the addressing mode
  • Resolve label references using the symbol table
  • Output the bytes

A hand assembler (done on paper) follows the same process. The programmer maintains the symbol table manually and looks up opcodes in the instruction reference manual. Historical programmers routinely hand-assembled programs of hundreds of instructions before automated tools existed.

Practical Tips for Assembly Programming

Always comment liberally. Assembly has no self-documenting names — a comment every 3–5 instructions is not excessive. Describe intent, not mechanics: ; multiply by 10 is more useful than ; add A to A.

Draw the memory map before writing code. Know where code lives, where the stack is, where variables go. Stack and program data colliding causes spectacular crashes with no error message.

Use symbolic constants instead of magic numbers. Define MAX_COUNT EQU 64 at the top of the file rather than scattering literal 64s throughout the code.

Test incrementally. Assemble and test each subroutine before building the next. With assembly, bugs from multiple untested components interacting are nearly impossible to debug.

On real hardware, use a single-step mode (if available) or toggle switches to execute one instruction at a time and inspect register and memory contents after each step. This is slow but certain.