Arithmetic Operations

How computers perform mathematical operations — from basic addition to division and floating point.

Why This Matters

Every useful computation involves arithmetic. Even programs that seem unrelated to mathematics — text editors, databases, network routers — perform countless additions, subtractions, comparisons, and bit manipulations internally. Understanding how computers perform arithmetic means understanding the foundation of all computation: what operations the hardware provides, how they behave at the binary level, and what the limits and edge cases are.

Understanding arithmetic is also essential for security and correctness. Integer overflow — when arithmetic produces a result too large to represent — has caused serious security vulnerabilities in major software. Floating-point errors have caused financial miscalculations, navigation failures, and scientific errors. These problems occur because the programmer did not understand how computer arithmetic actually works.

Integer Arithmetic: Addition and Subtraction

Modern processors represent integers in binary using two’s complement notation. In two’s complement, positive numbers are represented straightforwardly (42 is 00101010 in 8-bit binary). Negative numbers are represented by inverting all bits and adding 1 to the result: -42 is 11010110.

The advantage of two’s complement is that addition works correctly for both positive and negative numbers without any special cases. The processor performs the same binary addition whether the operands are positive or negative.

Binary addition proceeds from the least significant bit, adding corresponding bits and propagating the carry. This is identical to the addition process you learned for decimal numbers, but in base 2 rather than base 10.

Subtraction is implemented as addition of the two’s complement negative: A - B = A + (-B). The processor negates B (flip all bits and add 1) and adds it to A. This means the processor only needs one adder circuit — subtraction is free.

The carry flag is set when an addition produces a result that is too large to fit in the destination register (the result exceeds the register’s maximum value). For 8-bit unsigned arithmetic, adding 200 + 100 = 300 overflows an 8-bit value (maximum 255), setting the carry flag.

The overflow flag indicates signed arithmetic overflow — when the result is too large or too small for the signed representation. For 8-bit signed arithmetic, 120 + 120 = 240 overflows (maximum signed 8-bit value is 127), setting the overflow flag.

Multi-precision arithmetic (adding numbers larger than a single register) uses the carry flag: add the low words normally, then add the high words with the carry flag included (ADC, Add with Carry instruction). This chain of additions handles integers of arbitrary size.

Multiplication

Multiplication is conceptually simple but computationally expensive. The standard algorithm multiplies each bit of the multiplier by the multiplicand and sums the partial products — the same algorithm you learned in school, but in binary.

For two N-bit numbers, multiplication produces a 2N-bit result. Two 32-bit numbers multiplied can produce a 64-bit result. Processors typically provide both forms: truncating multiply (keeps only the lower N bits of the result, which is correct when the product fits in N bits) and extended multiply (stores the full 2N-bit result in two registers or a single wider register).

Modern processors perform multiplication in a few clock cycles using dedicated hardware multiplier circuits. Earlier processors did not have hardware multipliers and performed multiplication in software through repeated addition and bit shifting — much slower.

Multiplication by powers of two is a special and important case. Multiplying by 2 is the same as a left shift by one bit. Multiplying by 4 is two left shifts. Multiplying by any power of 2^k is a left shift by k bits. Left shifts are much faster than general multiplication on older hardware. Compilers automatically replace multiplication by constants with appropriate shift-and-add sequences when this is faster.

For example, x * 10 can be computed as (x << 3) + (x << 1) — shift left by 3 (multiply by 8) plus shift left by 1 (multiply by 2), summing to multiply by 10. No multiplication instruction needed.

Division

Division is the most expensive basic arithmetic operation. A general integer division requires 20-90 clock cycles on modern processors, compared to 1-4 for addition and 3-10 for multiplication.

Like multiplication, division of two N-bit numbers typically produces an N-bit quotient and an N-bit remainder. The remainder from integer division is the modulo operation (the % operator in most languages). Both quotient and remainder are computed simultaneously by the same instruction.

Division by zero is undefined and causes a hardware exception (trap) on most processors. The operating system catches this exception and typically terminates the program with an error. Programs must explicitly check for division by zero before performing division when the divisor might be zero.

Division by powers of two (for positive integers) is a right shift: dividing by 2^k is a right shift by k bits. For signed integers, right shift must be arithmetic (sign-extending) rather than logical (zero-filling) to handle negative dividends correctly. Compilers generate appropriate shift instructions for division by constant powers of two.

For division by other constants, compilers use a clever technique: multiply by the reciprocal. Division by 7 is equivalent to multiplication by 1/7. Representing 1/7 as a fixed-point constant (e.g., multiplied by 2^32 to get an integer approximation) allows the division to be computed as a multiplication and a right shift — much faster than a division instruction.

Comparison and Boolean Operations

Comparisons (is A equal to B? is A less than B?) are fundamental to program control flow. At the machine level, comparisons are performed with a subtract operation that discards the result but sets the condition flags (zero flag, carry flag, sign flag, overflow flag) based on the result.

If A - B = 0, the zero flag is set, indicating A equals B. If A - B is negative (sign flag) with no overflow, A is less than B for signed comparison. Unsigned comparison uses the carry flag. Conditional branch instructions check these flags to decide whether to jump.

Bitwise operations manipulate individual bits:

  • AND: sets each result bit to 1 only if both input bits are 1. Used for masking (isolating specific bits).
  • OR: sets each result bit to 1 if either input bit is 1. Used for combining bit fields.
  • XOR: sets each result bit to 1 if the input bits differ. Used for toggling bits and detecting differences.
  • NOT: inverts all bits.
  • Shift left/right: moves all bits by a specified count, filling with 0 (or sign bit for arithmetic right shift).

These operations appear constantly in low-level code: testing a specific bit (value & (1 << bit_position)), setting a bit (value | (1 << bit_position)), clearing a bit (value & ~(1 << bit_position)), and extracting a bit field ((value >> field_start) & field_mask).

Floating-Point Arithmetic

Floating-point numbers represent real numbers approximately using a format based on scientific notation. A 32-bit floating-point number (single precision, IEEE 754) stores: 1 sign bit, 8 exponent bits, and 23 mantissa bits. The value represented is: sign × 1.mantissa × 2^(exponent - 127).

Floating-point arithmetic is performed by dedicated floating-point units (FPUs). The FPU handles the alignment of exponents, addition of mantissas, normalization of results, and rounding — operations too complex for the integer arithmetic unit.

The critical property of floating-point arithmetic is that it is not exact. Most decimal fractions (0.1, 0.2, 0.3…) cannot be represented exactly in binary floating point, just as 1/3 cannot be represented exactly in decimal. The stored value is the closest representable value, which differs from the true value by a small rounding error.

These errors accumulate. 0.1 + 0.2 in floating point does not equal exactly 0.3 — it equals 0.30000000000000004. For financial calculations, this is unacceptable. Financial software uses decimal arithmetic (integers representing cents or thousandths) rather than floating point to avoid these errors.

Special values: IEEE 754 defines special values that arise from exceptional operations. Infinity () results from dividing by zero or overflow. NaN (Not a Number) results from invalid operations (0/0, sqrt(-1)). These values propagate through calculations — any operation involving NaN produces NaN — which allows programs to detect that a calculation went wrong.

Comparing floating-point values for equality (a == b) is usually wrong because of rounding errors. Instead, test whether the difference is small: abs(a - b) < epsilon for some appropriate small tolerance.