Storage Concepts
Part of Data Storage
The foundational vocabulary and principles that underpin every storage technology from punch cards to solid-state drives.
Why This Matters
Before you can build or repair any storage device, you need a mental model of what storage actually does. Storage is the art of preserving information across time β keeping a pattern intact even when the machine is off, the operator walks away, or decades pass. Every civilization that has rebuilt from collapse has done so partly by finding ways to write things down and read them back reliably.
In computing, βstorageβ encompasses everything from the tiny registers inside a processor that hold one number for one nanosecond, to archival magnetic tape that should last thirty years. Understanding the hierarchy of storage β why some memory is fast but temporary, while other storage is slow but permanent β lets you make smart engineering decisions when rebuilding computational infrastructure from scratch.
These concepts are not tied to any particular technology. Whether you are working with magnetic drums from the 1950s, punched cards from the 1890s, or semiconductor chips from the 2020s, the same vocabulary and tradeoffs apply.
Bits, Bytes, and Capacity Units
The atom of digital information is the bit β a value that is either 0 or 1, off or on, magnetized or not, hole or no hole. Everything a computer stores is ultimately a sequence of bits.
Bits are grouped for convenience. Eight bits form a byte, which can represent 256 distinct values (2βΈ = 256). A byte is large enough to hold one ASCII character, one small integer, or one pixelβs intensity in a grayscale image.
Capacity scales in powers of two (or approximately so in commercial practice):
- Kilobyte (KB): 1,024 bytes β enough for a short text document
- Megabyte (MB): 1,048,576 bytes β a small photograph or several minutes of audio at low quality
- Gigabyte (GB): approximately one billion bytes β a feature film at moderate quality
- Terabyte (TB): approximately one trillion bytes β tens of thousands of books
When rebuilding from scratch, you will likely work in kilobytes and megabytes for a long time. Early computers managed with kilobytes of storage and produced enormous practical value. Do not underestimate what you can accomplish with a few megabytes of well-organized information.
Volatile vs Persistent Storage
The most important distinction in storage is whether information survives when power is removed.
Volatile storage loses its contents the moment power is cut. Most semiconductor RAM is volatile β the transistors and capacitors that hold data require a continuous trickle of electricity to maintain their state. Volatile storage is typically fast and easy to write, which is why processors use it as working memory during computation.
Persistent storage (also called non-volatile storage) retains information indefinitely without power. Magnetic media, optical discs, paper, ROM chips, and flash memory are all persistent. The physical mechanism β magnetic alignment, pits in plastic, holes in paper, trapped electrons β remains stable without an energy supply.
This distinction drives the basic architecture of any computing system. You need volatile memory for fast working calculations, and persistent storage for keeping results, programs, and data across power cycles. A system with only volatile memory loses everything when it is shut down. A system with only persistent storage would be far too slow to be useful.
In a survival context, persistent storage becomes especially critical. You cannot afford to lose work because of a power interruption. When designing any computing infrastructure, always have a clear answer to the question: βIf power fails right now, what do we lose?β
Access Patterns: Sequential vs Random
Different storage technologies have fundamentally different access patterns, and choosing the wrong technology for a task creates severe performance problems.
Sequential access means you can only read data in order, from beginning to end (or from current position forward). Magnetic tape is the classic example β to reach a file at the end of a tape, you must fast-forward past everything before it. This is fine for archival backup (you write everything once and rarely need random retrieval) but disastrous for a database where you need the tenth record, then the thousandth, then the fifth.
Random access means you can jump directly to any location without reading intermediate data. Magnetic disks offer near-random access β the read head can seek to any sector within milliseconds. Semiconductor RAM offers true random access β any byte is equally fast to reach, regardless of location.
Block access is a middle ground. Most storage devices read and write in fixed-sized chunks called blocks or sectors, typically 512 bytes to 4 kilobytes. Even if you need only one byte, you read the whole block it lives in. Block access simplifies hardware design and amortizes the overhead of seeking.
When rebuilding storage infrastructure, match technology to access pattern. Use sequential media (tape, punched cards) for write-once archival. Use block-random media (disk, drum) for active databases and programs. Use true-random memory (RAM) for working calculations.
Addressing: Locating Any Stored Byte
To retrieve stored information, you need a way to say exactly where it lives. This is addressing.
The simplest addressing scheme numbers every byte or block sequentially from zero. Address 0 is the first byte, address 1 is the second, and so on up to the last address. This is called a linear address space.
Physical addresses refer to actual hardware locations: track 3, sector 7 of a disk; row 42, column 18 of a RAM chip. Physical addressing is efficient but fragile β if hardware changes, all addresses become wrong.
Logical addresses are an abstraction layer. Programs use logical addresses, and a translation layer (the operating system, a memory management unit, or a file system) maps them to physical locations. This lets you move data between devices, add more storage, or replace faulty hardware without rewriting every program that uses stored data.
In early computers with limited storage, programmers often worked directly with physical addresses and knew exactly which memory locations held which variables. As systems grew, the complexity became unmanageable, driving the development of file systems, virtual memory, and other addressing abstractions.
Data Integrity: The Constant Enemy
Storage fails. Magnetic domains flip. Cosmic rays flip bits in RAM. Punch cards tear. Discs accumulate scratches. Any storage technology that lasts long enough will eventually produce an error.
Data integrity is the practice of detecting and correcting these errors. The core technique is redundancy β storing more information than strictly necessary so that errors can be identified.
The simplest redundancy is a parity bit: one extra bit added to each byte that makes the total count of 1-bits even (or odd). If a single bit flips due to noise, the parity will now be wrong, signaling an error. Parity cannot fix the error β it can only detect it β but detection is often enough to trigger a retry or raise an alarm.
More sophisticated codes add multiple redundant bits and can both detect and correct errors. Hamming codes, for instance, can correct any single-bit error in a word by encoding the location of the error in the redundant bits.
At the file level, checksums verify that a stored file matches what was originally written. A checksum is a number computed from all the bytes in a file; re-computing the checksum after reading and comparing it to the stored value catches corruption.
Never assume storage is reliable. Every system that persists data should include some form of integrity verification.
Practical Hierarchy for Rebuilders
When reconstructing computing from scratch, think in terms of tiers matched to available technology:
Tier 0 β Human memory and oral tradition: No hardware needed, zero capacity, perfectly distributed. Insufficient for technical knowledge at scale.
Tier 1 β Paper and writing: Pencil, pen, or typewriter on durable paper. Decades of persistence. Excellent for programs, tables, reference data. Slow to write, slow to read by machine.
Tier 2 β Punched paper: Cards or tape with holes. Machine-readable. Thousands of characters per metre of tape. Requires mechanical reader/writer. Good for programs and small datasets.
Tier 3 β Magnetic tape: Hundreds of megabytes per reel. Sequential only. Long archival life if stored cool and dry. Requires precision mechanics and magnetic coating β harder to make from scratch but salvageable from existing infrastructure.
Tier 4 β Magnetic disk: Random access. Megabytes to terabytes. Requires precision mechanics and electronics. Excellent for active databases and operating systems.
Tier 5 β Semiconductor: Fastest, most compact, most complex to manufacture. Work with existing chips as long as they last; do not expect to fabricate new ones without advanced industrial infrastructure.
A realistic rebuild path starts at Tier 1 and works upward as industrial capacity permits. Even staying at Tier 2 β punched paper cards or tape β is sufficient to run useful programs and store significant bodies of knowledge in machine-readable form.