Strings

Strings are sequences of characters stored in memory — the fundamental data type for text processing, user interfaces, serial communication, and record keeping.

Why This Matters

Any program that interacts with humans uses strings. Command-line input is a string. Output messages are strings. Records in a database have string fields. Data transmitted over serial lines is typically encoded as text strings. Even programs that primarily process numbers need strings for labels, error messages, and display formatting.

For a rebuilding civilization’s software, strings are everywhere: patient name fields, crop variety names, error messages in monitoring systems, radio message text, instruction manuals stored in program memory. Understanding how strings are represented, stored, and manipulated is essential for writing any practical software.

The challenge of strings is that they vary in length — a city name might be 3 characters, a medical description 300. This variability requires design decisions about representation and memory management that do not arise for fixed-size types like integers.

String Representations

Null-terminated strings (C style): A sequence of bytes followed by a zero byte (the null terminator). The string ends wherever the null byte is. “HELLO” is stored as 5 bytes 48 45 4C 4C 4F followed by one zero byte 00 — 6 bytes total.

Advantages: simple, no length overhead per string, works naturally with character-by-character processing. Disadvantages: finding the length requires scanning to the null byte (O(N)); cannot contain null bytes in the string content; easy to make buffer overrun errors by forgetting the terminator.

Length-prefixed strings (Pascal style): A byte or word of length followed by that many character bytes. “HELLO” is stored as 05 48 45 4C 4C 4F — 6 bytes total (1 byte length + 5 bytes content). The length byte allows strings up to 255 characters (for a single-byte length field) or 65535 characters (for a 16-bit length field).

Advantages: O(1) length access; can contain any byte value including null; consistent structure simplifies memory management. Disadvantages: 1-2 bytes of overhead per string; maximum length is bounded by the length field size.

Fixed-length strings: Strings always occupy a fixed number of bytes, padded with spaces or null bytes if shorter. Used in database record fields where consistent record sizes matter. A 16-byte name field is always 16 bytes regardless of whether the name is 3 or 16 characters.

Advantages: simple memory layout, O(1) access to any string in an array. Disadvantages: wastes memory for short strings, truncates long strings.

For most rebuilding scenarios, null-terminated strings (for C-style code) or length-prefixed strings (for assembly code) are the right choice. Fixed-length strings are appropriate for database record fields.

Core String Operations

Finding length (null-terminated):

STRLEN:
  ; HL = start of string, returns length in BC
  PUSH HL
  LD BC, 0         ; counter
STRLEN_LOOP:
  LD A, (HL)       ; load byte
  CP 0             ; is it null?
  JP Z, STRLEN_DONE
  INC HL
  INC BC
  JP STRLEN_LOOP
STRLEN_DONE:
  POP HL
  RET

Copying strings (null-terminated):

STRCPY:
  ; HL = source, DE = destination
STRCPY_LOOP:
  LD A, (HL)       ; load byte from source
  LD (DE), A       ; store to destination
  CP 0             ; was it null?
  RET Z            ; if null, done
  INC HL
  INC DE
  JP STRCPY_LOOP

Comparing strings:

STRCMP:
  ; HL = string 1, DE = string 2
  ; returns: zero flag set if equal, carry set if str1 < str2
STRCMP_LOOP:
  LD A, (HL)       ; char from string 1
  LD B, A
  LD A, (DE)       ; char from string 2
  CP B             ; compare
  JP NZ, STRCMP_DONE  ; chars differ — zero flag clear, carry indicates order
  CP 0             ; are both zero? (both strings ended)
  JP Z, STRCMP_EQUAL  ; yes, strings are equal
  INC HL
  INC DE
  JP STRCMP_LOOP
STRCMP_EQUAL:
  XOR A            ; zero flag set, carry clear = equal
STRCMP_DONE:
  RET

Concatenation: Copy the first string to the destination, then copy the second string starting where the first ended (overwriting the null terminator of the first).

Searching for a substring: Scan the main string, and at each position, compare the pattern. If all pattern characters match, found. If any character differs, advance one position in the main string and try again. This is O(N×M) where N is the main string length and M is the pattern length — acceptable for short patterns.

Converting Numbers to Strings

Displaying a number as text requires converting the binary integer to a sequence of ASCII digit characters.

Integer to decimal string:

  1. While value > 0: divide by 10, collect the remainder as a digit
  2. Digits emerge in reverse order (least significant first), so build in a temporary buffer and reverse, or use the stack
INT_TO_STR:
  ; A = value (0-255), HL = output buffer
  LD B, 0          ; digit count
  LD DE, 10        ; divisor (if available; or use repeated subtraction)
DIV_LOOP:
  ; repeatedly subtract 10 until value < 10
  CP 10
  JP C, LAST_DIGIT
  SUB 10
  INC B            ; count 10s
  JP DIV_LOOP
LAST_DIGIT:
  ; A = last remainder, B = tens digit (simplified for 0-99)
  ; This works for small values; full division needed for larger numbers
  ADD '0'          ; convert to ASCII
  LD (HL), A
  ...

Full integer-to-string for arbitrary integers requires division, which on 8-bit processors without a divide instruction is itself a subroutine. The general method: repeatedly divide by 10, collect remainders as digits, then reverse the collected digit sequence (since remainders come out least-significant first).

Decimal string to integer:

STR_TO_INT:
  ; HL = string, returns value in A
  LD A, 0          ; accumulator
STR_LOOP:
  LD B, (HL)       ; load character
  LD C, B
  SUB '0'          ; convert ASCII digit to value (0-9)
  CP 10            ; valid digit?
  JP NC, STR_DONE  ; no — stop
  ; result = result * 10 + digit
  ; (multiply A by 10: A = A*8 + A*2)
  LD C, A
  ADD A, A         ; A = A*2
  ADD A, A         ; A = A*4
  ADD A, A         ; A = A*8
  ADD A, C         ; A = A*8 + original A = A*9
  ADD A, C         ; A = A*9 + original A = A*10
  ADD A, (digit from above)
  INC HL
  JP STR_LOOP
STR_DONE:
  RET

String Buffers and Safety

The most dangerous aspect of string handling is buffer overruns: writing more characters into a buffer than it has space for. This overwrites adjacent memory, corrupting data or return addresses.

Always:

  • Know the maximum size of your output buffer before writing to it
  • Check that each write will not exceed the buffer before writing
  • Null-terminate strings after building them
  • When reading user input, enforce a maximum length
; Safe string read: read up to MAX_LEN characters, then stop
READ_LINE:
  ; HL = buffer, B = max length (including null terminator)
  DEC B            ; reserve space for null terminator
READ_LOOP:
  CALL READ_CHAR   ; get next character into A
  CP 0x0D          ; carriage return = end of line?
  JP Z, READ_DONE
  CP 0x0A          ; newline also ends
  JP Z, READ_DONE
  LD (HL), A       ; store character
  INC HL
  DJNZ READ_LOOP   ; loop until B = 0 (buffer full)
READ_DONE:
  LD (HL), 0       ; null terminate
  RET

Practical Notes for Rebuilders

Choose one string representation and use it consistently throughout a codebase. Mixing null-terminated and length-prefixed strings in the same system is a constant source of bugs.

For record-keeping systems, fixed-length string fields simplify memory layout — each record is the same size — but require careful handling at boundaries (don’t forget to pad or truncate to exactly the field width). Document the field width for every string field in every record structure.

Invest in a small library of well-tested string routines early: copy, compare, length, number-to-string, string-to-number, find-in-string. These routines are used everywhere and must be completely reliable. A bug in STRCMP used in a lookup table corrupts all lookups.