Census Design
Part of Census and Demographics
How to design a census questionnaire and methodology that collects reliable, useful population data while remaining feasible to execute.
Why This Matters
A poorly designed census wastes the effort of every person who conducts it and every person who answers it. Questions that are ambiguous produce inconsistent answers. Questions that are too sensitive reduce participation. Questions that collect data with no clear use waste everyone’s time. A census that asks too much exhausts both enumerators and respondents; one that asks too little produces insufficient data for the decisions it is meant to inform.
Census design is the upstream decision that determines everything downstream. Good design makes field work faster, coding easier, analysis more reliable, and results more defensible. Bad design creates problems at every step that cannot be fixed after the fact.
The design process should start with the end: what decisions will this census inform? Work backward from those decisions to determine what data is needed, then design questions that collect that data accurately and efficiently.
Defining the Purpose
Before writing a single question, articulate exactly what the census is for.
Planning purposes: Food distribution, school planning, medical supply allocation — these require population counts by age group and location. A basic count with age, sex, and location is sufficient.
Economic planning: Understanding labor availability, skills distribution, and household economic status. Requires occupational questions, household assets, and land holdings.
Social policy: Understanding family structure, dependency ratios (how many workers support how many dependents), migration patterns. Requires household composition, ages, relationships.
Vital statistics: Births and deaths over the past year, infant mortality rates. Requires questions about recent vital events within each household.
Resource allocation to minorities: Ensuring specific groups receive proportional access to services. Requires sensitive questions about group identity.
Rank these purposes by importance and by feasibility. A first census in a resource-constrained setting should focus relentlessly on the most critical purposes. Additional questions can be added in later censuses once the system is working.
Question Design Principles
One concept per question: “How many people live in this house, including children and elderly?” is better than “Who lives here?” (too open-ended) or “How many adults, children, and elderly live here?” (three questions in one, creating reporting confusion).
Clear categories: If asking age, define the age groupings explicitly in advance (0-4, 5-14, 15-64, 65+) rather than asking an open age and grouping later — open-ended answers will have inconsistencies in how people report ages. However, recording actual age (where known) and grouping analytically is better if resources allow.
Exhaustive and mutually exclusive answer options: Every respondent should be able to find their correct answer, and no answer should fit two categories. “Male / Female / Not stated” covers the full range. “Farmer / Craftsperson / Other” requires the “Other” category to cover everything not listed — better to list all important categories explicitly.
Reference period: Make the time period for each question explicit. “How many people slept in this household last night?” captures current residents. “How many people usually live here?” captures de jure residents. Both have valid uses; mixing them within a census creates incomparable data.
Avoid leading questions: “Do you have enough food?” leads toward a particular answer. “How many days in the last week did your household have less food than needed?” is more neutral and specific.
Core Question Set
For a basic population census, these questions cover the most essential information:
Household level:
- Location (settlement name, household identifier)
- Total number of people who slept in the household last night
Individual level (for each person): 3. Name 4. Relationship to household head (head, spouse, child, parent, other relative, non-relative) 5. Sex 6. Age (last birthday) or year of birth 7. Marital status (if applicable to your policy purposes) 8. Primary occupation or activity (farmer, herder, craftsperson, trader, student, elderly/retired, child under 15, other)
Optional additions (add only if resources allow and purpose is defined): 9. Literacy (can read and write) 10. Disability or chronic illness limiting work 11. Births in household in past 12 months (name, sex, survival status) 12. Deaths in household in past 12 months (name, sex, estimated age, cause if known) 13. Land holdings (area farmed, owned vs. rented)
Questionnaire Layout
Paper forms: Pre-print forms with clear fields for each answer. Each household gets one form. Columns for up to 8-10 individuals per page (with continuation for larger households). Fields should be large enough to write legibly in field conditions.
Coding conventions: For common responses, use numeric codes rather than writing out answers in full. Code 1 = male, 2 = female saves time and avoids spelling variations. Code book must be consistent across all enumerators and accessible to analysts.
Instructions on the form: Brief instructions should appear next to each question for enumerators who cannot remember the training verbatim. “Record age in completed years. If unknown, estimate and mark ‘E’.”
Space for notes: Include a free-text area on each form for enumerators to record issues, refusals, unusual situations, or follow-up items. These notes are invaluable for quality review.
Piloting the Design
Before deploying, test the questionnaire design in a small sample (10–20 households representative of the population’s diversity).
What to look for:
- Questions that confuse respondents
- Questions with unexpectedly high non-response rates
- Questions where enumerators code answers inconsistently
- Time per interview (if too long, shorten the questionnaire)
- Questions that produce resentment or refusals
Revise the design based on pilot findings. A revised questionnaire tested in a second small pilot (if resources allow) almost always shows improvements. Questionnaires that skip piloting almost always contain fixable problems that were instead discovered across thousands of interviews, in the worst possible way.
From Design to Implementation
The final questionnaire design should be locked before training begins. Changes during data collection destroy comparability between different parts of the survey. Once committed to a design, the only changes are those required to fix questions that are genuinely unworkable — and even these must be documented carefully with the date of change and which forms used which version.