Instruction Set
Part of Basic Computing
An instruction set is the complete set of operations a processor can perform — the contract between hardware and software that defines what programs can express and what the machine can execute.
Why This Matters
The instruction set architecture (ISA) is the most important design decision in building a processor. It determines what programs can be written, how efficient they run, how complex the hardware must be, and how difficult programming is. Too few instructions and programs become impossibly tedious; too many and the hardware grows unwieldy.
Historical instruction sets reveal the thinking of their designers. The 6502 (used in Apple II and Commodore 64) had 56 instructions and addressed 64 KB with 8-bit data — minimal but sufficient for an era of tight resources. The x86 family has accumulated thousands of instruction variants over 40 years. Both extremes are valid choices given their constraints.
Designing an instruction set for a hand-built computer is both a practical necessity and a profound learning exercise. Every choice has tradeoffs, and working through those tradeoffs builds deep intuition about computation.
Minimum Required Instructions
A complete, universal computer requires surprisingly few instruction types. The theoretical minimum is one (the SUBLEQ machine uses only subtract-and-branch-if-less-than-or-equal), but practical minimalism balances expressibility with programming convenience.
A practical minimal instruction set (RISC-style, 8-bit data, 16-bit addresses):
Data movement:
- LOAD Reg, Address — copy memory[Address] to Reg
- STORE Reg, Address — copy Reg to memory[Address]
- MOV RegA, RegB — copy RegB to RegA
Arithmetic:
- ADD RegA, RegB — RegA = RegA + RegB
- SUB RegA, RegB — RegA = RegA - RegB
- INC Reg — Reg = Reg + 1
- DEC Reg — Reg = Reg - 1
Logic:
- AND RegA, RegB — bitwise AND
- OR RegA, RegB — bitwise OR
- XOR RegA, RegB — bitwise XOR
- NOT Reg — bitwise complement
- SHL Reg — shift left (multiply by 2)
- SHR Reg — shift right (divide by 2)
Control flow:
- JMP Address — unconditional jump to address
- JZ Address — jump if zero flag set
- JC Address — jump if carry flag set
- JN Address — jump if negative flag set
- CALL Address — push PC, jump to subroutine
- RET — pop PC, return from subroutine
I/O:
- IN Reg, Port — read from I/O port
- OUT Port, Reg — write to I/O port
Miscellaneous:
- NOP — no operation (wait one cycle)
- HLT — halt processor
This set of ~24 instructions is sufficient to write any computable program, including an assembler for itself, a BASIC interpreter, and eventually a C compiler.
Instruction Encoding
Each instruction must be encoded as binary. The instruction format determines how opcodes and operands are packed into bytes.
Simple fixed-length 16-bit instruction format:
Bits 15-11: opcode (5 bits = 32 possible instructions)
Bits 10-8: destination register (3 bits = 8 registers)
Bits 7-5: source register A (3 bits)
Bits 4-3: source register B (3 bits) OR addressing mode (2 bits)
Bits 2-0: immediate value low (3 bits)
For instructions with 16-bit addresses (LOAD, STORE, JMP, CALL), a second 16-bit word follows the instruction word. The hardware must handle variable-length instruction fetching.
Alternative: pure variable-length encoding like the 8080/Z80:
- 1-byte opcode (256 possible opcodes)
- Optional 1-byte or 2-byte operand follows
Variable-length encoding is more compact but requires sequential decoding — each instruction must be decoded before knowing how long it is, making pipelined execution harder.
For a hand-built CPU, fixed-length 16-bit instructions are recommended: predictable fetch, simple decode, manageable total instruction count.
Addressing Modes
Addressing modes specify how operand values are obtained. Each mode requires different hardware in the decode/execute stage.
Immediate: the operand value is embedded in the instruction itself. ADD R0, #5 — adds constant 5 to R0. Fast (no memory read for operand) but limited to small constants.
Register direct: operand is in a register. ADD R0, R1 — all operands in registers. Fastest execution, no memory access.
Absolute (direct): operand is at a fixed memory address. LOAD R0, $1000 — load from address 0x1000. Simple but inflexible for data structures.
Register indirect: operand is at the memory address contained in a register. LOAD R0, (R1) — R1 holds the address, load from that address. Enables pointers and dynamic addressing.
Indexed: effective address = base register + index register (or immediate offset). LOAD R0, $100(R1) — load from address 0x100 + R1. Enables array access (base + index).
Each additional addressing mode adds complexity to the decoder and execute stage. Start with immediate, register direct, and absolute. Add indirect when pointers are needed.
Condition Codes and Branching
The status register (flags register) holds condition codes set by arithmetic and logic instructions:
- Z (Zero): result was zero
- C (Carry): addition produced carry out, or subtraction needed borrow
- N (Negative): result’s MSB is 1 (negative in two’s complement)
- V (Overflow): signed arithmetic overflowed
Branch instructions test one or more condition codes:
- JZ: jump if Z=1 (result was zero)
- JNZ: jump if Z=0 (result was nonzero)
- JC: jump if C=1 (carry occurred)
- JN: jump if N=1 (result negative)
- JGE: jump if N XOR V = 0 (signed greater-or-equal)
The compare instruction (CMP RegA, RegB = SUB but discard result, set flags) sets condition codes without modifying registers, enabling pure condition testing before branching.
Carefully designed condition codes and branch instructions allow the full range of if/else/while/for constructs to be compiled or hand-coded into assembly. Get the condition code logic right early — it is difficult to change later without breaking all existing programs.