Data Movement
Part of Programming Fundamentals
Data movement instructions transfer values between registers, memory, and I/O ports — the most fundamental operations in any program.
Why This Matters
Before a program can compute anything, it must load data from somewhere. After computing, it must store results somewhere. Before it can display anything, it must move data to the output device. Data movement is the plumbing of programming: not glamorous, but nothing works without it.
On 8-bit microprocessors — the hardware most accessible to rebuilding civilizations — data movement instructions typically constitute half or more of all instructions in a typical program. Understanding what moves where, and at what cost in time and memory, is essential for writing correct and efficient code.
For anyone debugging a program without a source-level debugger, data movement is what you trace through memory dumps and register displays. Every bug leaves traces in data: wrong values at wrong locations, values that should have moved but did not, values that moved to the wrong place. Data movement is where bugs live.
Registers and Memory
The CPU has a small number of fast registers — typically 8 to 32 on early microprocessors. Registers hold values that are actively being processed. Memory (RAM) holds the bulk of program data: arrays, strings, buffers, stack frames. Moving data between registers and memory is the most common class of data movement instruction.
On the Z80:
LD A, (HL) ; load register A from the memory address in HL
LD (HL), A ; store register A to the memory address in HL
LD A, 42 ; load immediate value 42 into A
LD HL, 0x1000 ; load immediate address 0x1000 into HL
On the 6502:
LDA #$2A ; load immediate value 42 ($2A hex) into accumulator A
LDA $1000 ; load A from memory address $1000
STA $1000 ; store A to memory address $1000
LDX #$00 ; load immediate value into X register
The notation for immediate (literal value in the instruction) versus indirect (value at the address) varies by processor and assembler syntax. Know your CPU’s conventions precisely.
Addressing Modes
Different addressing modes specify how to calculate the memory address for a load or store. Mastering them is essential for writing efficient code.
Immediate: The operand is a literal value embedded in the instruction. LD A, 42 loads the number 42, not whatever is at address 42. Used for constants.
Direct (absolute): The operand is a fixed memory address. LD A, (0x2000) loads from address 0x2000. Used for global variables at known locations.
Register indirect: The operand is the memory address stored in a register. LD A, (HL) loads from the address currently in the HL register pair. Essential for working with arrays and pointer-based data structures — change the register to access a different address.
Indexed: The address is a base address plus a register value (the index). LD A, (IX+5) loads from the address (IX + 5). Used for accessing structure fields at a fixed offset from a base pointer, or for array access when the base address is in IX.
Stack addressing: Push and pop operations use the stack pointer register, automatically incrementing or decrementing it. PUSH HL saves HL on the stack; POP HL restores it.
I/O port addressing: Some processors (Z80) have separate address spaces for I/O devices. IN A, (port) reads from I/O port; OUT (port), A writes to it. Other architectures (6502) use memory-mapped I/O: device registers appear at specific memory addresses and are accessed with ordinary load/store instructions.
Moving Blocks of Data
Moving a single byte is common; moving a block of bytes is equally common — clearing a buffer, copying a string, loading a program. A simple byte-copy loop:
; copy COUNT bytes from SOURCE to DEST
LD HL, SOURCE ; source address
LD DE, DEST ; destination address
LD B, COUNT ; byte count
COPY_LOOP:
LD A, (HL) ; load byte from source
LD (DE), A ; store byte to destination
INC HL ; advance source pointer
INC DE ; advance destination pointer
DJNZ COPY_LOOP ; decrement B, loop if not zero
The Z80 has a dedicated block copy instruction, LDIR, that does this entire loop in hardware:
LD HL, SOURCE
LD DE, DEST
LD BC, COUNT
LDIR ; copy BC bytes from (HL) to (DE), incrementing both
LDIR is dramatically faster than an explicit loop. Similarly, LDDR copies backward, LDIR with comparison variants can copy until a sentinel byte is found, and similar block instructions exist for search and fill. Learn the block instructions for your CPU — they represent significant performance wins for common operations.
Filling a buffer with a constant value (zeroing, clearing):
; fill 256 bytes starting at BUFFER with value 0
LD HL, BUFFER
LD B, 0 ; B = 0 → DJNZ counts as 256 iterations
XOR A ; set A to 0
CLEAR_LOOP:
LD (HL), A
INC HL
DJNZ CLEAR_LOOP
Stack Operations
The stack is a region of memory managed by the stack pointer register. PUSH saves a register to the stack (stores at current SP, decrements SP). POP restores it (increments SP, loads from that address).
PUSH AF ; save accumulator and flags
PUSH BC ; save BC register pair
; ... do something that modifies AF and BC ...
POP BC ; restore BC (note: reverse order of pushes)
POP AF ; restore AF
The stack grows downward in most architectures. The stack pointer points to the most recently pushed value (on Z80) or one location below it (on 6502). Know your CPU’s stack convention.
The stack is used for:
- Preserving register values across subroutine calls
- Passing parameters to subroutines
- Storing local variables
- Storing return addresses (done automatically by CALL/RET instructions)
Stack overflow — pushing more data than the stack has space for — corrupts whatever memory lies below the stack. Allocate sufficient stack space and monitor stack depth in complex programs.
I/O Device Communication
Communicating with hardware peripherals is data movement to/from special addresses or ports. A serial communication controller might have three relevant addresses:
- Status register (read-only): bit 0 = transmit buffer empty, bit 1 = receive buffer full
- Transmit register (write-only): write a byte here to send it
- Receive register (read-only): read a byte here to receive it
A routine to transmit a byte:
SEND_BYTE:
; wait until transmit buffer is empty
WAIT_TX:
IN A, (SERIAL_STATUS) ; read status register
AND 0x01 ; mask bit 0 (tx empty flag)
JP Z, WAIT_TX ; wait if not empty
; buffer is empty, safe to send
LD A, (BYTE_TO_SEND)
OUT (SERIAL_TX), A ; write byte to transmit register
RET
This busy-wait polling approach works for slow data rates. For high-speed communication, interrupt-driven I/O is more efficient: the hardware generates an interrupt when it is ready, freeing the CPU to do other work between data transfers.
Data Movement Costs
Not all data movement is equally fast. In rough order from fastest to slowest:
- Register-to-register moves (1 clock cycle on most CPUs)
- Register-immediate loads (1-2 cycles, but instruction is larger)
- Register-to/from-memory (3-10 cycles depending on memory speed)
- I/O port accesses (slower, often require wait states)
Keeping frequently used values in registers rather than repeatedly loading and storing them is the most important performance optimization at the assembly level. This is called register allocation, and compilers spend considerable effort doing it automatically.
Practical Notes for Rebuilders
When tracing a bug through a memory dump, focus on data movement: what value was in the register before the failing instruction? What address was it accessing? Was the data at that address correct? Trace backward until you find where a correct value became incorrect.
Comment every non-obvious data movement operation in assembly code. LD A, (HL) alone says nothing; ; load next character from input buffer tells the reader why.
When porting code to a new CPU, data movement instructions are the first thing to translate — they differ more across architectures than arithmetic operations, which follow more universal conventions.