Data Access Rules

How to govern access to population data — balancing the public value of shared information against individual privacy and the misuse risks of detailed personal records.

Why This Matters

Population data is powerful. It can guide food distribution, identify health risks, and inform infrastructure planning. It can also be used to target individuals, identify minorities, track dissidents, and facilitate persecution. The history of censuses includes both indispensable contributions to human welfare and catastrophic misuses — from Nazi Germany using census records to locate Jews, to colonial governments using demographic data to enforce racial laws.

Data access rules determine which outcome prevails. Getting them right requires thinking through who might have access to what data, for what purposes, and what prevents misuse. Done well, access rules build public trust (making people more willing to provide accurate information) and prevent the specific harms that make census data dangerous.

These rules must be designed explicitly, enforced consistently, and communicated publicly. They cannot be improvised after the data is collected.

The Fundamental Distinction: Individual vs. Aggregate

The core protective principle: individual-level data (records identifying specific people) and aggregate data (statistics about groups) carry vastly different risks and should have different access rules.

Individual-level data: A record showing that household 347 at a specific location consists of a 45-year-old man, his 40-year-old wife, three children under 15, and an elderly mother can be used to identify, locate, and target that specific family. This level of detail should be accessible only to the enumerators who collected it, the data processing staff, and analysts under controlled conditions. It should never be accessible to general government use.

Aggregate data: “The western district has 4,200 residents, of whom 35% are under 15 and 8% are over 65” identifies no individual and poses minimal privacy risk. This level of summary information should be freely published and available to anyone — that is the entire point of conducting a census.

The transition from individual to aggregate — how small a group is too small to publish statistics about — is a key design question. Publishing “the 3 households of ethnic group X in district Y” effectively identifies the individuals.

Minimum Aggregation Standards

Before publishing any table, apply a minimum aggregation threshold: do not publish cell values below 5 (or 10, depending on risk assessment). If a cell is too small, suppress it (mark as ”-” or “n<5”) or combine it with an adjacent cell.

Suppression rule: When a cell is suppressed, also suppress related cells that would allow the suppressed value to be calculated by subtraction. If row total and all other cells are published, the suppressed cell can be calculated. “Complementary suppression” prevents this.

Small area problem: The more geographic detail you publish, the greater the risk that small-area cells identify individuals. Balance the planning value of fine geographic detail against the privacy cost. Consider publishing fine geographic detail only at higher levels of aggregation (e.g., age group totals only, not age × occupation × ethnicity cross-tabs at village level).

Categories of Users and Their Access Levels

General public: Published aggregate tables and summary reports only. No access to individual records or small-area cross-tabulations that could enable identification.

Government planners: Access to detailed geographic breakdowns of population totals and key characteristics (age distribution, household size) for planning purposes. No access to individual records. Access subject to stated purpose and documentation.

Research users: Access to anonymized microdata — individual records with all identifying information removed and geographic information generalized (district level, not village). Must sign data use agreement specifying approved uses and prohibiting re-identification attempts. Files returned or destroyed after project completion.

Census statistical office staff: Full access to all data for processing and analysis. Subject to employment agreements, training on confidentiality, and access logs.

Law enforcement / courts: Access only under specific legal order, for specific named individuals, with documented justification. Census data should not be routinely available to law enforcement — the principle of statistical confidentiality is fundamental to participation.

Creating Access Rules That Work

Document the rules: Written rules that are publicly available. Everyone who might request access or be asked for access knows what is permitted.

Enforce consistently: Rules that are bent for powerful requesters teach the public that the rules are not real. Consistency — including refusing requests from high-status officials that would be refused from ordinary ones — is essential for credibility.

Audit access: Log who accessed what data, when, and for what stated purpose. Review logs periodically. Unexplained access patterns get investigated.

Time-limited raw data: Individual-level data is most sensitive immediately after collection, when individuals are living and identifiable. Many statistical systems have an embargo period (30–100 years) after which historical individual records may be released for genealogical and historical research. This balances privacy protection with historical scholarship.

Destruction schedule: Data that is no longer needed for its stated purpose should be destroyed according to a documented schedule. Accumulating indefinite archives of sensitive data creates risk without benefit.

Communicating Rules to Respondents

Respondents are entitled to know, at the time of the census interview, what data will be collected, who will see it, how it will be used, and what protections are in place. This information:

  • Must be communicated before interview, not buried in fine print
  • Must be accurate — stated protections must actually be implemented
  • Must include the most common concerns: will this be used for taxation, conscription, persecution?

If stated protections are strong and genuine, communicating them clearly increases participation. If stated protections are weak or misleading, communicating them clearly is counterproductive (it reminds people of risks). The data access design and the participation strategy are inseparable.