Labels and Symbols

Labels and symbols give meaningful names to memory addresses and constants in assembly language, making programs readable and maintainable.

Why This Matters

Without labels and symbols, an assembly language program is a sea of hexadecimal numbers. Every jump instruction contains the raw numeric address of its target. Every constant is an unexplained literal value. When you change the program and insert new instructions, all addresses shift and every jump target must be manually recalculated. This is not programming β€” it is arithmetic.

Labels solve the address recalculation problem. Symbols solve the magic number problem. Together they transform assembly from a mechanical encoding exercise into genuine programming. A program that says JP IDLE_LOOP communicates intent; a program that says JP 0x1047 communicates nothing.

For rebuilders who will spend months to years programming at the assembly level before high-level language compilers are available, labels and symbols are the primary tools that make that work maintainable by human beings.

What Labels Are

A label is a name for a position in the program. In assembly source code, a label is written at the start of a line, typically followed by a colon:

MAIN_LOOP:
    LD A, (SENSOR_PORT)
    CP 0
    JP Z, MAIN_LOOP

MAIN_LOOP: defines a label at the address of the LD A instruction that follows it. The instruction JP Z, MAIN_LOOP uses that label as its jump target β€” the assembler calculates the address of MAIN_LOOP and encodes it in the jump instruction.

Labels can mark:

  • Loop entry points (for backward jumps)
  • Branch targets (for forward jumps)
  • Subroutine entry points (for CALL instructions)
  • Data locations (for load and store instructions)
  • Table entries (for indexed access)

The key property of a label is that it automatically tracks its correct address even when code is added, removed, or rearranged above it. The assembler recalculates all label addresses fresh on every assembly run.

What Symbols Are

Symbols (also called equates or constants) give names to numeric values that are not addresses. Where labels represent positions in memory, symbols represent values used in the program.

Define a symbol with an EQU (equate) directive:

BUFFER_SIZE     EQU 256      ; number of bytes in the input buffer
MAX_SENSORS     EQU 8        ; maximum number of connected sensors
UART_STATUS_REG EQU 0x8000   ; address of UART status register
TRANSMIT_READY  EQU 0x01     ; bitmask for TX ready bit
ERROR_TIMEOUT   EQU 255      ; error code for timeout condition

In later code, these names appear instead of the numbers:

    LD B, MAX_SENSORS
    LD HL, UART_STATUS_REG
    IN A, (UART_STATUS_REG)
    AND TRANSMIT_READY
    JP NZ, TRANSMIT

Reading this code, you understand what it does without consulting comments: load the count of sensors, read the UART status, check if transmit is ready. Without the symbols: LD B, 8, IN A, (0x8000), AND 0x01 β€” the same operations, but the meaning is hidden.

Naming Conventions

Good label and symbol names are essential for readability. Conventions that work well in assembly:

All-uppercase for symbols and labels: Distinguishes them from instructions (which are also typically uppercase in many assembler conventions). Some use uppercase for symbols and mixed-case for labels.

Underscore-separated words: MAIN_LOOP, READ_SENSOR, BUFFER_SIZE β€” each word separated by underscore. Readable and compact.

Hierarchical names for local structure: If subroutine SEND_BYTE has internal loop labels, name them SEND_WAIT_TX, SEND_LOOP etc. β€” prefixed with the owning subroutine name. This prevents name collisions in large programs.

Consistent prefixes by type: Address constants with a hardware-area prefix (UART_, TIMER_, GPIO_), error codes with ERR_, buffer sizes with SIZE_.

Meaningful not clever: LOOP1 says nothing; READ_SENSOR_LOOP says everything needed. Future maintainers β€” including you in six months β€” will thank you.

Local Labels

In a program with hundreds of subroutines, common label names like LOOP or DONE appear repeatedly. Two subroutines both using LOOP causes a duplicate label error.

Solutions:

Prefix with subroutine name: UART_READ_LOOP, GPIO_SCAN_LOOP. Verbose but always unambiguous.

Local label syntax: Many assemblers provide a local label syntax β€” typically a label starting with a period, underscore, or number β€” that is only visible within a defined scope. The Z80-specific assembler NASM uses %%label for local labels. Some assemblers use numeric labels: 1: through 9: can be reused, with 1B meaning β€œmost recent label 1 before this point” and 1F meaning β€œnext label 1 forward.”

Using local labels for loop internals and subroutine-private branches keeps the global symbol table clean and prevents naming conflicts.

The Symbol Table

The assembler maintains a symbol table β€” an internal data structure mapping symbol names to their values. When the assembler encounters a label definition, it adds an entry. When it encounters a label reference in an instruction, it looks up the entry and substitutes the address.

For a two-pass assembler: the first pass builds the symbol table (processing all label definitions and computing their addresses). The second pass generates code, consulting the symbol table to resolve all references.

Forward references β€” using a label before it is defined β€” are handled by the two-pass approach. The first pass records where the label is defined; the second pass correctly fills in all references to it. A single-pass assembler must use a different technique (backpatching: emit a placeholder, record the reference, then fix it up when the label is defined) to handle forward references.

Macros as Named Code Sequences

Some assemblers extend the symbol concept to macros β€” named sequences of instructions that the assembler expands wherever the macro name appears. This is a form of compile-time code generation.

; Define a macro to save registers
SAVE_REGS MACRO
    PUSH AF
    PUSH BC
    PUSH DE
    PUSH HL
ENDM

; Use the macro
MYROUTINE:
    SAVE_REGS
    ; ... body of routine
    POP HL
    POP DE
    POP BC
    POP AF
    RET

Every occurrence of SAVE_REGS expands to the four PUSH instructions. Macros reduce repetition and keep related code consistent. They differ from subroutines in that each expansion generates separate code β€” no subroutine call overhead β€” appropriate for very short, frequently used sequences where the call overhead would dominate.

Macro parameters allow more flexible expansion:

LOAD_REG MACRO reg, value
    LD reg, value
ENDM

LOAD_REG A, 42   ; expands to: LD A, 42
LOAD_REG B, 0    ; expands to: LD B, 0

Macros are a powerful feature that can either clarify or confuse code, depending on how they are used. Reserve macros for patterns that genuinely recur and that improve readability when named.

Practical Notes for Rebuilders

Define all hardware addresses and bit masks as symbols at the top of your source file (or in a dedicated header file if your assembler supports include directives). Never use magic numbers directly in code. A hardware address that appears as a raw number in twelve places must be changed in twelve places when the hardware changes; a symbol must be changed once.

Keep label names short enough to type quickly but long enough to be unambiguous. In practice, 10-20 characters is the sweet spot. UART_TX_WAIT is clear and not unwieldy.

Establish naming conventions early and enforce them consistently. A codebase where one programmer uses LOOP_START and another uses start_loop and a third uses 1: is harder to read than any single consistent style. Pick one convention, document it, and apply it everywhere.

The symbol table in your assembler is worth documenting as it grows. A printed listing of all symbols and their values is useful for debugging β€” it tells you at a glance the memory map of your program.