Sound Wave Basics

Part of Telephony

The physics of sound — pressure waves, frequency, amplitude, and how they relate to speech intelligibility in telephone systems.

Why This Matters

Telephone engineering is applied acoustic physics. Every design decision — microphone type, diaphragm dimensions, bandwidth limits, amplifier gain — flows from an understanding of what sound is, how it behaves, and what properties are essential for speech intelligibility. Engineers who do not understand sound physics cannot reason about why certain telephone designs sound better than others or why standard frequencies (300-3,400 Hz bandwidth, 90V at 20 Hz for ringing) were chosen.

Sound wave understanding also prevents common misconceptions. Many people assume that “more bandwidth means better voice quality” but this is only conditionally true. The telephone bandwidth of 300-3,400 Hz was chosen specifically because it includes all the acoustic features humans use to identify phonemes and distinguish words — wider bandwidth adds presence but not intelligibility. Knowing why this is true — which requires understanding formants, harmonics, and frequency-selective hearing — prevents over-engineering voice systems at the cost of bandwidth, hardware complexity, or noise.

What Sound Is

Sound is a mechanical wave — a propagating disturbance of pressure in a medium (gas, liquid, or solid). In air, sound waves consist of alternating zones of compression (air molecules pushed closer together, above atmospheric pressure) and rarefaction (molecules pulled apart, below atmospheric pressure). These pressure zones propagate outward from the source at the speed of sound.

Speed of sound in air at 20°C is 343 m/s (approximately 344 m/s at 25°C; it increases with temperature at about 0.6 m/s per degree Celsius). Speed is independent of frequency — all frequencies travel at the same speed in homogeneous air, which is why music from a distant band sounds in-tempo rather than temporally smeared.

Sound waves are characterized by four parameters: frequency, amplitude, phase, and waveshape.

Frequency is the number of complete cycles (compression-rarefaction pairs) per second, measured in Hertz (Hz). The human auditory system detects frequencies from approximately 20 Hz to 20,000 Hz. Below 20 Hz, individual pressure variations are felt as separate pulses, not heard as continuous tone. Above 20,000 Hz (ultrasound), the ear’s hair cells cannot respond.

Amplitude is the magnitude of pressure variation above and below atmospheric pressure, measured in Pascals (Pa). Sound pressure level (SPL) is usually expressed in decibels (dB) relative to a reference pressure of 20 µPa (the threshold of human hearing). Ordinary conversation at one meter produces approximately 60 dB SPL. Shouting at one meter: 85 dB. Pain threshold: 120-130 dB.

Voice Frequency Structure

The human voice is a complex sound generated by the vibration of vocal cords (fundamental frequency plus harmonics) shaped by the resonances of the vocal tract (pharynx, mouth, nasal passages). The fundamental frequency of adult male speech is approximately 85-180 Hz; adult female speech 165-255 Hz.

The vocal tract resonances (formants) shape the spectrum of the voice. The first two formants (F1 and F2) are the primary cues for vowel identification. F1 lies in the range 200-900 Hz; F2 lies in the range 700-2,500 Hz. Consonants rely more on high-frequency energy (2,000-8,000 Hz) because many consonant sounds are noise-like (fricatives: S, F, Sh, Th) whose distinguishing features lie in their spectral tilt and cutoff frequencies.

This frequency structure explains the telephone bandwidth decision. The 300-3,400 Hz band captures both formants completely and includes enough high-frequency consonant information for accurate word identification. The 300 Hz low cutoff rejects hum and low-frequency noise from power systems. The 3,400 Hz high cutoff rejects hiss and high-frequency noise. Word intelligibility tests consistently show that this band provides 90-95% word recognition for English and most other languages.

Wavelength and Room Acoustics

Wavelength (the physical distance between successive compressions) determines how sound interacts with objects and boundaries. Wavelength = speed / frequency. At 340 Hz, wavelength = 343/340 = 1.0 meter. At 3,400 Hz, wavelength = 343/3400 = 0.1 meter.

Objects much smaller than the wavelength scatter sound diffusely. Objects much larger than the wavelength reflect sound specularly (like light off a mirror). At telephone audio frequencies (300-3,400 Hz), most room furnishings are comparable in size to the wavelength — they cause complex diffraction and partial absorption that characterizes the acoustic behavior of rooms.

Reverberation (the gradual decay of sound energy after a source stops) is a key acoustic environment parameter. Highly reverberant rooms (hard walls, little absorption) extend the persistence of previous sounds into the present moment, smearing rapid consonants and reducing intelligibility. Carpeted rooms with soft furnishings absorb sound quickly and have low reverberation — better intelligibility but at the cost of acoustic “liveness.” Telephone conversations from highly reverberant spaces (bathrooms, empty concrete rooms) sound obviously different and less intelligible.

Implications for Telephone Design

Microphone placement: Position the microphone at the correct distance from the mouth (typically 2-5 cm for carbon and dynamic types) to avoid the proximity effect (low-frequency boost at very close range) while maximizing signal level. Too far from the source and the signal-to-noise ratio drops as background noise becomes more significant.

Room coupling: Telephone instrument housings act as small acoustic resonators. The volume of air enclosed between the microphone and the housing aperture has a resonant frequency determined by its volume. Keep this cavity small (under 5 cc) to avoid resonances within the voice band.

Microphone directionality: Omnidirectional microphones (carbon type) pick up sound equally from all directions. This is appropriate for hand-held telephone use where the microphone position is controlled. For desk telephones with hands-free operation, directional microphones that attenuate sound from behind reduce room noise pickup.

Echo considerations: A telephone receiver held near the microphone radiates sound that couples back into the microphone — an acoustic feedback path in addition to the electrical sidetone path. This requires that the receiver and microphone be well isolated in the handset housing. A handset that presses firmly against the ear reduces receiver leakage into the room and correspondingly reduces acoustic coupling to the microphone.