Hierarchical Directories

Part of Data Storage

Folders within folders — how tree-structured directory systems organize thousands of files without chaos.

Why This Matters

Once a computing system accumulates more than a few dozen files, a flat directory becomes unmanageable. Which of these 200 programs is the text editor? Which data file belongs to which project? Hierarchical directories — folders within folders — solve this by allowing files to be grouped logically, named locally (so you can have both index.txt in /agriculture and index.txt in /medicine without conflict), and navigated by path.

Every modern operating system uses hierarchical directories. Unix introduced the unified tree-rooted filesystem in the 1960s; DOS/Windows adopted a similar structure. The mental model — a tree with a root at the top, branches being directories, and leaves being files — is one of the most productive abstractions in computing.

For rebuilders, implementing hierarchical directories is the natural next step after a flat file system works. The extension is modest in complexity: a directory entry that points to another directory rather than a file. Understanding both the design and the implementation lets you build a usable, scalable storage system from scratch.

The Tree Model

A hierarchical file system is a tree:

  • The root directory is the tree’s root. On Unix, it is /; on DOS/Windows, it is C:\.
  • Directories (also called folders) are nodes that can contain other directories or files.
  • Files are leaf nodes — they contain data, not other entries.

Any file can be uniquely identified by its path: the sequence of directory names from the root to the file, separated by a delimiter character (typically / on Unix/Linux/Mac, \ on DOS/Windows).

Example paths:

  • /agriculture/irrigation/channel-design.txt (Unix)
  • C:\Agriculture\Irrigation\ChannelDesign.txt (DOS)

Both identify a file named channel-design.txt inside a directory irrigation inside a directory agriculture inside the root.

Working directory: Programs and users operate with a current working directory. Paths that start from the root are absolute paths (begin with / or C:\). Paths that start from the current directory are relative paths (do not begin with a delimiter). Relative paths allow portability: code that opens data/log.txt works regardless of where it is installed on the tree.

How Directories Are Stored

At the physical level, a directory is just a file with a special format: it contains a list of directory entries, each mapping a name to either a file (data blocks) or a subdirectory (another directory block).

The crucial difference from flat: In a flat system, all directory entries live in a fixed reserved region at the start of the disk. In a hierarchical system, each directory is a file on the disk that can grow dynamically as entries are added.

Directory entry: Similar to flat file system entries, but with an additional flag indicating whether the entry is a subdirectory:

Name:     8.3 format (FAT) or variable-length (Unix)
Type:     FILE or DIRECTORY
Location: starting block number (or inode number in Unix)
Size:     file size in bytes (or 0 for directories, which have their own size block)

How navigation works:

  1. Start at the root directory (located at a fixed known block, stored in the superblock).

  2. For path /agri/irrigation/channel.txt:

    • Read root directory, find entry named agri with type DIRECTORY.
    • Follow to the block containing the agri directory.
    • Read agri directory, find entry named irrigation with type DIRECTORY.
    • Follow to the irrigation directory block.
    • Read irrigation directory, find entry named channel.txt with type FILE.
    • Follow to the file’s starting block and read data.
  3. This chain of directory reads is called path resolution or name resolution. It is performed by the file system driver on every file open operation.

Unix Inode Architecture

Unix separates file metadata from directory structure through the inode (index node):

Inode: A fixed-size data structure (typically 128 or 256 bytes) stored in a reserved region at the beginning of the partition. Contains:

  • Mode (file type + permission bits: read/write/execute for owner, group, others)
  • Link count (number of directory entries pointing to this inode)
  • Owner UID and GID
  • File size in bytes
  • Timestamps (access, modification, change)
  • Block pointers (direct, indirect, double-indirect, triple-indirect)

Directory entry in Unix: Contains only two fields — a name (variable length, up to 255 characters) and an inode number. The directory does not contain file size, permissions, or any other metadata — all of that is in the inode.

Advantages of inode separation:

  1. Hard links: Two directory entries in different directories can point to the same inode number. The file has one set of data but two names. Either entry can be deleted; the inode (and data) is freed only when the link count reaches zero.
  2. Rename: Moving a file within the same partition is an O(1) operation — just update the directory entry, no data copying.
  3. Directory independence: Large files can be renamed or moved without touching the file data at all.

Block pointers in inodes: An inode has 12 “direct” block pointers (for small files, all 12 blocks pointed to directly). For larger files, a 13th pointer points to an “indirect block” — a block filled with block pointers (for 512-byte blocks with 4-byte pointers: 128 additional block pointers). For even larger files, a 14th pointer points to a “double indirect block” — a block of pointers to indirect blocks (128 × 128 = 16,384 more blocks). Triple indirect adds another level. Maximum file size on a 512-byte block Unix FS: (12 + 128 + 16,384 + 2,097,152) × 512 bytes ≈ 1 GB.

The Dot Entries: . and ..

Every Unix directory contains two special entries:

. (dot): The current directory. Points to the same inode as the directory itself. Allows referring to the current directory as a path component: ./myfile is equivalent to just myfile. Useful in commands like ./configure (run configure in current directory, not a system program named configure).

.. (dot-dot): The parent directory. Points to the inode of the directory that contains the current directory. Allows traversal upward: ../sibling refers to sibling in the parent directory.

The existence of .. in the root directory (/) points back to the root itself — the root is its own parent. This is the conventional way to terminate the chain of parent lookups.

Path canonicalization: A path like /agri/../medicine/herbs.txt is equivalent to /medicine/herbs.txt. Path canonicalization processes .. entries by removing the preceding path component, producing the shortest equivalent absolute path.

Implementing a Simple Hierarchical File System

Starting from a working flat file system, the minimal extension needed for hierarchy:

1. Add directory type to entries: Add a one-byte flag (0=file, 1=directory) to each directory entry.

2. Allocate directory blocks dynamically: Instead of a fixed directory region, directories are stored in data blocks like files. The root directory is at a fixed block (say, block 4, recorded in the superblock). Other directories are allocated in the data region as needed.

3. Implement directory creation: mkdir allocates a new directory block, initializes it with . and .. entries, and adds an entry in the parent directory pointing to it.

4. Implement path resolution: Split the path on /, then follow each component through directory lookups starting at root (for absolute paths) or the current working directory (for relative paths).

5. Implement working directory tracking: The operating system kernel maintains a current working directory for each running process (or for the single running process in a simple system). Commands like chdir update this value.

This extension adds perhaps 200–300 lines of code to a working flat file system implementation — not a large jump in complexity but a large jump in usability.

Directory Conventions for a Rebuilt System

Standard tree structure for a general-purpose system:

/
  bin/          — executable programs
  data/         — persistent data files
  lib/          — shared libraries and modules
  tmp/          — temporary files (not backed up)
  etc/          — configuration files
  log/          — log files
  user/         — user home directories
    alice/
    bob/
  archive/      — long-term archived data
    2026/
    2027/

Naming rules: Use lowercase letters, digits, and hyphens. No spaces in names (spaces in path components create enormous difficulties in scripting and command-line use). Dates in ISO 8601 format (YYYY-MM-DD) in filenames allow correct lexicographic sorting.

Permissions (if implementing Unix-style access control): Programs in /bin should be executable by all users (mode 755). Configuration in /etc should be readable by all, writable only by root (mode 644 for files, 755 for directories). User data in /user/alice should be writable only by alice (mode 700 for the home directory).

Even in a single-user system, implementing rudimentary permissions prevents accidental deletion of system files and establishes habits that will scale when the system grows to multiple users.