Opcodes and Operands
Part of Programming Fundamentals
Opcodes and operands are the two components of every machine instruction — the operation to perform and the data or address to perform it on.
Why This Matters
Every instruction the CPU executes consists of an opcode — a number that identifies which operation to perform — and zero or more operands that specify what data to operate on. Understanding this structure is the foundation of all low-level programming, disassembly, debugging, and instruction set design.
When you look at raw machine code bytes and need to understand what a program does, you start by identifying opcodes and their operands. When you hand-assemble a program, you look up opcodes in a reference table and encode operands according to the addressing mode. When you design an assembler, the opcode table is its heart. When you design a new CPU, choosing the opcode encoding determines what programs will look like at the machine level for the architecture’s entire lifetime.
This topic is the bridge between the symbolic world of mnemonics and labels and the physical world of bytes in memory.
What an Opcode Is
An opcode (operation code) is a numeric value — typically one byte on 8-bit CPUs, though two or more bytes for extended instruction sets — that uniquely identifies a specific CPU operation. When the CPU fetches an instruction from memory, it first reads the opcode byte. The opcode tells the CPU which operation to perform, which internal circuit to activate, and how many additional bytes follow as operands.
The complete set of valid opcodes for a processor is its instruction set. On an 8-bit processor, a single opcode byte can encode up to 256 different instructions (0x00 through 0xFF). In practice, some of those codes are used as prefix bytes for extended instructions rather than standalone operations.
The Z80 opcode 0x78 means “LD A, B” (load register A with the value of register B). The opcode 0x3E means “LD A, n” (load register A with an immediate byte that follows). The opcode 0xC3 means “JP nn” (jump to the 16-bit address that follows in the next two bytes). These are fixed by the CPU design and cannot be changed by the programmer.
What Operands Are
Operands are the bytes that follow the opcode and specify the data or addresses the instruction uses. How many operand bytes follow an opcode, and how to interpret them, depends on the specific opcode.
No operands: Some instructions operate on implied registers or values and need no further specification. Z80 NOP (0x00) simply does nothing; no operand needed. RET (0xC9) returns from subroutine using the stack; no operand needed because the return address is implicit.
One byte operand: An immediate value or a short offset. LD A, n (opcode 0x3E) is followed by one byte: the value to load. JR offset (opcode 0x18) is followed by one signed byte: the relative offset to jump.
Two byte operand: A 16-bit address or 16-bit immediate value. JP nn (opcode 0xC3) is followed by two bytes: the low byte and high byte of the jump target address (little-endian on Z80 — low byte first). LD HL, nn (opcode 0x21) is followed by two bytes: the 16-bit immediate value for HL.
No following bytes (operands implicit in opcode): Many instructions have operands encoded within the opcode byte itself. The Z80 opcodes 0x40 through 0x7F (except 0x76) are all forms of LD r, r' — load register-to-register. The specific registers are encoded in bits 3-5 (destination) and bits 0-2 (source) of the opcode. LD B, C (0x41), LD B, D (0x42), LD C, B (0x48) — all the same opcode structure with different register codes in the bits.
Opcode Encoding Structure
For the Z80, register-to-register LD instructions occupy the middle of the opcode space:
Opcode: 0 1 D D D S S S
^--------^ instruction prefix: 01 = LD r,r'
^--^--^ destination register (3 bits): B=000, C=001, D=010, E=011, H=100, L=101, (HL)=110, A=111
^--^--^ source register (same encoding)
LD B, C = 01 000 001 = 0x41
LD A, H = 01 111 100 = 0x7C
LD (HL),A= 01 110 111 = 0x77
Arithmetic operations on the accumulator occupy another band:
ADD A, r = 10 000 r (ADD: 000)
ADC A, r = 10 001 r (ADC: 001)
SUB r = 10 010 r
SBC A, r = 10 011 r
AND r = 10 100 r
XOR r = 10 101 r
OR r = 10 110 r
CP r = 10 111 r
This regularity — arithmetic operations occupy 0x80-0xBF with the register code in the low 3 bits and operation code in bits 3-5 — is a deliberate design feature. It makes the instruction set more predictable and makes assemblers easier to write.
Instruction Length
Instructions vary in length from 1 to 4 bytes on typical 8-bit processors:
1 byte: NOP, RET, PUSH BC, INC A, etc.
2 bytes: LD A, n (opcode + immediate byte)
JR offset (opcode + signed offset byte)
3 bytes: JP nn, CALL nn (opcode + 16-bit address)
LD A, (nn) (opcode + 16-bit address)
4 bytes: Z80 extended instructions with IX/IY registers: DD/FD prefix + opcode + displacement + immediate
Knowing instruction lengths is essential for:
- Hand assembly: knowing how many bytes to write
- Calculating jump offsets: the offset in JR is relative to the instruction after the jump, so you must know the jump instruction is 2 bytes
- Disassembly: knowing how many bytes to consume for each instruction to correctly identify the next instruction’s start
- Code size optimization: preferring shorter instruction sequences when memory is scarce
Disassembly
Disassembly is the reverse of assembly: reading bytes from memory and determining which instruction each byte (or group of bytes) represents. You need this when:
- Examining code without source files (loaded from a ROM you did not write)
- Debugging a crash where the program is executing at an unexpected address
- Verifying that an assembler or compiler produced the code you intended
Disassembly algorithm:
- Read the byte at the current address. This is the opcode.
- Look up the opcode in the reference table. This tells you the instruction mnemonic and how many operand bytes follow.
- Read that many additional bytes.
- Format and display the instruction: address, mnemonic, operands.
- Advance the current address by total instruction length and repeat.
The challenge: a byte that is an operand looks identical to a byte that is an opcode. Disassembly must start at the correct instruction boundary. If you start reading in the middle of a multi-byte instruction, all subsequent disassembly is misaligned and wrong.
Illegal/Undefined Opcodes
Not every possible opcode byte corresponds to a defined instruction. On the Z80, some opcodes are officially undefined. On many processors, these undefined opcodes behave in unpredictable ways — they might perform combinations of partial operations from adjacent defined instructions, causing different behavior on different chip revisions.
Never use undefined opcodes in programs that must be reliable. Programs that exploit undefined opcodes for clever tricks become broken when the processor is updated or replaced.
For an assembler, generate errors for any attempt to use undefined opcodes. This prevents accidental use.
Practical Notes for Rebuilders
Build the opcode table for your CPU from the official data sheet, not from secondary sources. Data sheets contain authoritative encoding tables. Secondary sources (books, websites) sometimes contain transcription errors.
Organize your opcode table by functional group (data movement, arithmetic, logical, control flow) rather than by numeric value when studying it. Understanding the encoding structure — why these bytes have these meanings — helps you memorize and internalize the instruction set faster than memorizing individual entries.
A printed opcode map — a grid with rows and columns representing the high and low nibbles of the opcode byte — gives an at-a-glance view of the instruction space. Patterns in the grid (clusters of related instructions, regular encodings for register fields) become visible and aid memorization.