Strings
Part of Programming Fundamentals
Strings are sequences of characters stored in memory — the fundamental data type for text processing, user interfaces, serial communication, and record keeping.
Why This Matters
Any program that interacts with humans uses strings. Command-line input is a string. Output messages are strings. Records in a database have string fields. Data transmitted over serial lines is typically encoded as text strings. Even programs that primarily process numbers need strings for labels, error messages, and display formatting.
For a rebuilding civilization’s software, strings are everywhere: patient name fields, crop variety names, error messages in monitoring systems, radio message text, instruction manuals stored in program memory. Understanding how strings are represented, stored, and manipulated is essential for writing any practical software.
The challenge of strings is that they vary in length — a city name might be 3 characters, a medical description 300. This variability requires design decisions about representation and memory management that do not arise for fixed-size types like integers.
String Representations
Null-terminated strings (C style): A sequence of bytes followed by a zero byte (the null terminator). The string ends wherever the null byte is. “HELLO” is stored as 5 bytes 48 45 4C 4C 4F followed by one zero byte 00 — 6 bytes total.
Advantages: simple, no length overhead per string, works naturally with character-by-character processing. Disadvantages: finding the length requires scanning to the null byte (O(N)); cannot contain null bytes in the string content; easy to make buffer overrun errors by forgetting the terminator.
Length-prefixed strings (Pascal style): A byte or word of length followed by that many character bytes. “HELLO” is stored as 05 48 45 4C 4C 4F — 6 bytes total (1 byte length + 5 bytes content). The length byte allows strings up to 255 characters (for a single-byte length field) or 65535 characters (for a 16-bit length field).
Advantages: O(1) length access; can contain any byte value including null; consistent structure simplifies memory management. Disadvantages: 1-2 bytes of overhead per string; maximum length is bounded by the length field size.
Fixed-length strings: Strings always occupy a fixed number of bytes, padded with spaces or null bytes if shorter. Used in database record fields where consistent record sizes matter. A 16-byte name field is always 16 bytes regardless of whether the name is 3 or 16 characters.
Advantages: simple memory layout, O(1) access to any string in an array. Disadvantages: wastes memory for short strings, truncates long strings.
For most rebuilding scenarios, null-terminated strings (for C-style code) or length-prefixed strings (for assembly code) are the right choice. Fixed-length strings are appropriate for database record fields.
Core String Operations
Finding length (null-terminated):
STRLEN:
; HL = start of string, returns length in BC
PUSH HL
LD BC, 0 ; counter
STRLEN_LOOP:
LD A, (HL) ; load byte
CP 0 ; is it null?
JP Z, STRLEN_DONE
INC HL
INC BC
JP STRLEN_LOOP
STRLEN_DONE:
POP HL
RET
Copying strings (null-terminated):
STRCPY:
; HL = source, DE = destination
STRCPY_LOOP:
LD A, (HL) ; load byte from source
LD (DE), A ; store to destination
CP 0 ; was it null?
RET Z ; if null, done
INC HL
INC DE
JP STRCPY_LOOP
Comparing strings:
STRCMP:
; HL = string 1, DE = string 2
; returns: zero flag set if equal, carry set if str1 < str2
STRCMP_LOOP:
LD A, (HL) ; char from string 1
LD B, A
LD A, (DE) ; char from string 2
CP B ; compare
JP NZ, STRCMP_DONE ; chars differ — zero flag clear, carry indicates order
CP 0 ; are both zero? (both strings ended)
JP Z, STRCMP_EQUAL ; yes, strings are equal
INC HL
INC DE
JP STRCMP_LOOP
STRCMP_EQUAL:
XOR A ; zero flag set, carry clear = equal
STRCMP_DONE:
RET
Concatenation: Copy the first string to the destination, then copy the second string starting where the first ended (overwriting the null terminator of the first).
Searching for a substring: Scan the main string, and at each position, compare the pattern. If all pattern characters match, found. If any character differs, advance one position in the main string and try again. This is O(N×M) where N is the main string length and M is the pattern length — acceptable for short patterns.
Converting Numbers to Strings
Displaying a number as text requires converting the binary integer to a sequence of ASCII digit characters.
Integer to decimal string:
- While value > 0: divide by 10, collect the remainder as a digit
- Digits emerge in reverse order (least significant first), so build in a temporary buffer and reverse, or use the stack
INT_TO_STR:
; A = value (0-255), HL = output buffer
LD B, 0 ; digit count
LD DE, 10 ; divisor (if available; or use repeated subtraction)
DIV_LOOP:
; repeatedly subtract 10 until value < 10
CP 10
JP C, LAST_DIGIT
SUB 10
INC B ; count 10s
JP DIV_LOOP
LAST_DIGIT:
; A = last remainder, B = tens digit (simplified for 0-99)
; This works for small values; full division needed for larger numbers
ADD '0' ; convert to ASCII
LD (HL), A
...
Full integer-to-string for arbitrary integers requires division, which on 8-bit processors without a divide instruction is itself a subroutine. The general method: repeatedly divide by 10, collect remainders as digits, then reverse the collected digit sequence (since remainders come out least-significant first).
Decimal string to integer:
STR_TO_INT:
; HL = string, returns value in A
LD A, 0 ; accumulator
STR_LOOP:
LD B, (HL) ; load character
LD C, B
SUB '0' ; convert ASCII digit to value (0-9)
CP 10 ; valid digit?
JP NC, STR_DONE ; no — stop
; result = result * 10 + digit
; (multiply A by 10: A = A*8 + A*2)
LD C, A
ADD A, A ; A = A*2
ADD A, A ; A = A*4
ADD A, A ; A = A*8
ADD A, C ; A = A*8 + original A = A*9
ADD A, C ; A = A*9 + original A = A*10
ADD A, (digit from above)
INC HL
JP STR_LOOP
STR_DONE:
RET
String Buffers and Safety
The most dangerous aspect of string handling is buffer overruns: writing more characters into a buffer than it has space for. This overwrites adjacent memory, corrupting data or return addresses.
Always:
- Know the maximum size of your output buffer before writing to it
- Check that each write will not exceed the buffer before writing
- Null-terminate strings after building them
- When reading user input, enforce a maximum length
; Safe string read: read up to MAX_LEN characters, then stop
READ_LINE:
; HL = buffer, B = max length (including null terminator)
DEC B ; reserve space for null terminator
READ_LOOP:
CALL READ_CHAR ; get next character into A
CP 0x0D ; carriage return = end of line?
JP Z, READ_DONE
CP 0x0A ; newline also ends
JP Z, READ_DONE
LD (HL), A ; store character
INC HL
DJNZ READ_LOOP ; loop until B = 0 (buffer full)
READ_DONE:
LD (HL), 0 ; null terminate
RET
Practical Notes for Rebuilders
Choose one string representation and use it consistently throughout a codebase. Mixing null-terminated and length-prefixed strings in the same system is a constant source of bugs.
For record-keeping systems, fixed-length string fields simplify memory layout — each record is the same size — but require careful handling at boundaries (don’t forget to pad or truncate to exactly the field width). Document the field width for every string field in every record structure.
Invest in a small library of well-tested string routines early: copy, compare, length, number-to-string, string-to-number, find-in-string. These routines are used everywhere and must be completely reliable. A bug in STRCMP used in a lookup table corrupts all lookups.