Data Movement

Data movement instructions transfer values between registers, memory, and I/O ports — the most fundamental operations in any program.

Why This Matters

Before a program can compute anything, it must load data from somewhere. After computing, it must store results somewhere. Before it can display anything, it must move data to the output device. Data movement is the plumbing of programming: not glamorous, but nothing works without it.

On 8-bit microprocessors — the hardware most accessible to rebuilding civilizations — data movement instructions typically constitute half or more of all instructions in a typical program. Understanding what moves where, and at what cost in time and memory, is essential for writing correct and efficient code.

For anyone debugging a program without a source-level debugger, data movement is what you trace through memory dumps and register displays. Every bug leaves traces in data: wrong values at wrong locations, values that should have moved but did not, values that moved to the wrong place. Data movement is where bugs live.

Registers and Memory

The CPU has a small number of fast registers — typically 8 to 32 on early microprocessors. Registers hold values that are actively being processed. Memory (RAM) holds the bulk of program data: arrays, strings, buffers, stack frames. Moving data between registers and memory is the most common class of data movement instruction.

On the Z80:

LD A, (HL)      ; load register A from the memory address in HL
LD (HL), A      ; store register A to the memory address in HL
LD A, 42        ; load immediate value 42 into A
LD HL, 0x1000   ; load immediate address 0x1000 into HL

On the 6502:

LDA #$2A        ; load immediate value 42 ($2A hex) into accumulator A
LDA $1000       ; load A from memory address $1000
STA $1000       ; store A to memory address $1000
LDX #$00        ; load immediate value into X register

The notation for immediate (literal value in the instruction) versus indirect (value at the address) varies by processor and assembler syntax. Know your CPU’s conventions precisely.

Addressing Modes

Different addressing modes specify how to calculate the memory address for a load or store. Mastering them is essential for writing efficient code.

Immediate: The operand is a literal value embedded in the instruction. LD A, 42 loads the number 42, not whatever is at address 42. Used for constants.

Direct (absolute): The operand is a fixed memory address. LD A, (0x2000) loads from address 0x2000. Used for global variables at known locations.

Register indirect: The operand is the memory address stored in a register. LD A, (HL) loads from the address currently in the HL register pair. Essential for working with arrays and pointer-based data structures — change the register to access a different address.

Indexed: The address is a base address plus a register value (the index). LD A, (IX+5) loads from the address (IX + 5). Used for accessing structure fields at a fixed offset from a base pointer, or for array access when the base address is in IX.

Stack addressing: Push and pop operations use the stack pointer register, automatically incrementing or decrementing it. PUSH HL saves HL on the stack; POP HL restores it.

I/O port addressing: Some processors (Z80) have separate address spaces for I/O devices. IN A, (port) reads from I/O port; OUT (port), A writes to it. Other architectures (6502) use memory-mapped I/O: device registers appear at specific memory addresses and are accessed with ordinary load/store instructions.

Moving Blocks of Data

Moving a single byte is common; moving a block of bytes is equally common — clearing a buffer, copying a string, loading a program. A simple byte-copy loop:

; copy COUNT bytes from SOURCE to DEST
  LD HL, SOURCE    ; source address
  LD DE, DEST      ; destination address
  LD B, COUNT      ; byte count
COPY_LOOP:
  LD A, (HL)       ; load byte from source
  LD (DE), A       ; store byte to destination
  INC HL           ; advance source pointer
  INC DE           ; advance destination pointer
  DJNZ COPY_LOOP   ; decrement B, loop if not zero

The Z80 has a dedicated block copy instruction, LDIR, that does this entire loop in hardware:

  LD HL, SOURCE
  LD DE, DEST
  LD BC, COUNT
  LDIR             ; copy BC bytes from (HL) to (DE), incrementing both

LDIR is dramatically faster than an explicit loop. Similarly, LDDR copies backward, LDIR with comparison variants can copy until a sentinel byte is found, and similar block instructions exist for search and fill. Learn the block instructions for your CPU — they represent significant performance wins for common operations.

Filling a buffer with a constant value (zeroing, clearing):

; fill 256 bytes starting at BUFFER with value 0
  LD HL, BUFFER
  LD B, 0          ; B = 0 → DJNZ counts as 256 iterations
  XOR A            ; set A to 0
CLEAR_LOOP:
  LD (HL), A
  INC HL
  DJNZ CLEAR_LOOP

Stack Operations

The stack is a region of memory managed by the stack pointer register. PUSH saves a register to the stack (stores at current SP, decrements SP). POP restores it (increments SP, loads from that address).

PUSH AF          ; save accumulator and flags
PUSH BC          ; save BC register pair
; ... do something that modifies AF and BC ...
POP BC           ; restore BC (note: reverse order of pushes)
POP AF           ; restore AF

The stack grows downward in most architectures. The stack pointer points to the most recently pushed value (on Z80) or one location below it (on 6502). Know your CPU’s stack convention.

The stack is used for:

  • Preserving register values across subroutine calls
  • Passing parameters to subroutines
  • Storing local variables
  • Storing return addresses (done automatically by CALL/RET instructions)

Stack overflow — pushing more data than the stack has space for — corrupts whatever memory lies below the stack. Allocate sufficient stack space and monitor stack depth in complex programs.

I/O Device Communication

Communicating with hardware peripherals is data movement to/from special addresses or ports. A serial communication controller might have three relevant addresses:

  • Status register (read-only): bit 0 = transmit buffer empty, bit 1 = receive buffer full
  • Transmit register (write-only): write a byte here to send it
  • Receive register (read-only): read a byte here to receive it

A routine to transmit a byte:

SEND_BYTE:
  ; wait until transmit buffer is empty
WAIT_TX:
  IN A, (SERIAL_STATUS)   ; read status register
  AND 0x01                ; mask bit 0 (tx empty flag)
  JP Z, WAIT_TX           ; wait if not empty
  ; buffer is empty, safe to send
  LD A, (BYTE_TO_SEND)
  OUT (SERIAL_TX), A      ; write byte to transmit register
  RET

This busy-wait polling approach works for slow data rates. For high-speed communication, interrupt-driven I/O is more efficient: the hardware generates an interrupt when it is ready, freeing the CPU to do other work between data transfers.

Data Movement Costs

Not all data movement is equally fast. In rough order from fastest to slowest:

  1. Register-to-register moves (1 clock cycle on most CPUs)
  2. Register-immediate loads (1-2 cycles, but instruction is larger)
  3. Register-to/from-memory (3-10 cycles depending on memory speed)
  4. I/O port accesses (slower, often require wait states)

Keeping frequently used values in registers rather than repeatedly loading and storing them is the most important performance optimization at the assembly level. This is called register allocation, and compilers spend considerable effort doing it automatically.

Practical Notes for Rebuilders

When tracing a bug through a memory dump, focus on data movement: what value was in the register before the failing instruction? What address was it accessing? Was the data at that address correct? Trace backward until you find where a correct value became incorrect.

Comment every non-obvious data movement operation in assembly code. LD A, (HL) alone says nothing; ; load next character from input buffer tells the reader why.

When porting code to a new CPU, data movement instructions are the first thing to translate — they differ more across architectures than arithmetic operations, which follow more universal conventions.