Data Structures

Data structures are organized arrangements of data in memory that enable efficient storage, retrieval, and modification of information.

Why This Matters

The right data structure makes a program simple and fast. The wrong one makes it complicated and slow. A grain inventory that stores records in an array scanned linearly from the beginning every time a lookup occurs works fine for 50 records and becomes unusably slow for 5,000. The same program with records in a sorted array using binary search, or in a hash table, handles 50,000 records as easily as 50.

For a rebuilding civilization’s computing infrastructure, data structures determine the practical scale of record-keeping systems. A medical record system must look up patient records quickly by name or ID. A seed inventory must list all varieties of a given crop. A parts database must find everything with a specific property. These operations require more than arrays — they require thoughtfully organized data.

Understanding the fundamental data structures — and knowing when to use each — is the single biggest factor in whether programs remain usable as data volumes grow.

Arrays (Review)

The array is the most fundamental structure: a fixed-size, contiguous block of same-type elements. O(1) random access by index, O(N) search, O(N) insertion (must shift elements). Best when: you know the size in advance, you access by index frequently, and ordered linear traversal is common.

A sorted array enables binary search: O(log N) lookup. For a read-heavy dataset that changes rarely, keeping an array sorted is worthwhile.

Records and Structs

A record (struct in C terminology) groups related fields of different types into a single named unit. A patient record might contain a name (string), an ID number (integer), a date of birth (3 integers), and a blood type (byte). Without records, you must maintain parallel arrays: one for names, one for IDs, one for dates. With records, all fields for one patient travel together.

In assembly language, a record is a convention: you reserve consecutive memory locations for the fields and document which offset holds which field.

; Patient record layout (per record):
; Offset 0: ID (2 bytes, unsigned)
; Offset 2: Name (16 bytes, null-terminated string)
; Offset 18: Age (1 byte)
; Offset 19: Blood type (1 byte) — 0=A, 1=B, 2=AB, 3=O
; Total: 20 bytes per record
PATIENT_SIZE EQU 20

; Access field of record at address in HL:
; HL = base of record
; HL + 0 = ID
; HL + 2 = Name
; etc.

An array of records: allocate PATIENT_SIZE × MAX_PATIENTS bytes. To access record N, compute BASE + N × PATIENT_SIZE.

Stacks (as Data Structures)

The stack data structure (distinct from the hardware call stack) implements last-in-first-out (LIFO) order. The two operations are PUSH (add to top) and POP (remove from top). A stack can be implemented as an array with a top-of-stack index.

; Stack using array A with top pointer TOP
PUSH(value):
  TOP = TOP + 1
  A(TOP) = value

POP():
  value = A(TOP)
  TOP = TOP - 1
  return value

Stacks are useful for: expression evaluation, undo/redo history, depth-first traversal, temporary storage during recursive algorithms implemented iteratively.

Queues

A queue implements first-in-first-out (FIFO) order. Add at the back (enqueue), remove from the front (dequeue). Implement as a circular buffer: an array with HEAD (next to read) and TAIL (next to write) indices that wrap around.

ENQUEUE(value):
  if (TAIL + 1) mod SIZE == HEAD: error (full)
  A(TAIL) = value
  TAIL = (TAIL + 1) mod SIZE

DEQUEUE():
  if HEAD == TAIL: error (empty)
  value = A(HEAD)
  HEAD = (HEAD + 1) mod SIZE
  return value

Queues are essential for communication buffers (serial receive queues, task queues) and any situation where items must be processed in the order they arrived.

Hash Tables

A hash table provides O(1) average-case lookup, insertion, and deletion by key — dramatically faster than array search for large datasets. The key insight: compute a number (the hash) from the key, use it as an index into an array, and store the record there.

A simple hash function for a string key: sum the ASCII values of all characters, take the result modulo the table size.

HASH(key, table_size):
  sum = 0
  for each character C in key:
    sum = sum + ASCII(C)
  return sum mod table_size

Collision handling: Two different keys may produce the same hash value — a collision. The simplest resolution is chaining: each table slot holds a linked list of all records with that hash. Lookup searches the (usually short) chain at the hashed slot.

Alternative: open addressing, where a collision causes a search for the next empty slot. Simpler to implement but degrades more sharply when the table is more than 70-80% full.

A good hash table with load factor under 0.75 (entries / slots) gives near-constant-time operations regardless of table size. For record databases, patient lookups, inventory systems — anywhere you look up by key — a hash table is the right structure.

Linked Lists

A linked list stores elements as nodes, where each node contains a value and a pointer (address) to the next node. Unlike arrays, linked lists do not require contiguous memory and can grow or shrink dynamically.

Each node:

; Node layout:
; Offset 0: value (2 bytes)
; Offset 2: next pointer (2 bytes, address of next node, or 0 if last)
; Total: 4 bytes per node
NODE_SIZE EQU 4

Linked list operations:

  • Traversal: Follow the chain of next pointers from HEAD until next = 0
  • Insertion: Allocate a new node, set its next pointer to the current next, update the previous node’s pointer
  • Deletion: Update the previous node’s next pointer to skip the deleted node; free the deleted node’s memory

Linked lists are O(N) for lookup (must traverse from the head) but O(1) for insertion and deletion at a known position (just adjust pointers). Best used when: the list size changes frequently, insertions/deletions in the middle are common, and lookup by index is rare.

The doubly linked list adds a previous pointer to each node, enabling O(1) traversal in either direction and easier deletion without knowing the predecessor.

Binary Search Trees

A binary search tree (BST) organizes records for O(log N) lookup, insertion, and deletion. Each node has a value, a left child pointer, and a right child pointer. Invariant: every value in the left subtree is less than the node’s value; every value in the right subtree is greater.

To search: compare target to current node. If equal, found. If less, go left. If greater, go right. Repeat until found or null pointer reached.

A balanced BST (where left and right subtrees have approximately equal depth) guarantees O(log N) operations. An unbalanced BST (all insertions in order, creating a linear chain) degrades to O(N). For a rebuilding civilization, a sorted array with binary search is simpler to implement and performs similarly for read-heavy data; a BST is preferable when insertions and deletions are frequent.

Choosing the Right Structure

Use caseBest structure
Fixed-size sequence, access by indexArray
Sorted lookup, read-heavySorted array + binary search
Lookup by string/arbitrary keyHash table
Frequent insertion/deletion, orderedLinked list or BST
LIFO (undo, call stack)Stack
FIFO (communication, task queue)Queue / circular buffer
Grouped fields of different typesRecord/struct

Practical Notes for Rebuilders

Implement each of these structures once in your primary language. Write a small test program for each that exercises all operations and verifies correctness. Having tested, documented implementations ready to copy into new programs eliminates the need to reinvent them each time.

For early computing infrastructure, a well-implemented hash table for record lookup transforms what is usable. The difference between a lookup that takes half a second (linear search, 1,000 records) and one that is instantaneous (hash table) determines whether a tool is actually used.

Memory allocation (getting and returning memory for dynamic structures) deserves its own treatment. For simple systems, a bump allocator — a pointer that advances through available memory, never freeing — is often sufficient.