IEEE 754 Machine Numbers and Machine Arithmetic

In order to make numerical programs more portable between different machines, the IEEE 754 standard defines machine numbers and how arithmetic operations should be performed. Virtually all current computers comply with this standard.

William Kahan and the History of IEEE 754
Soon after its conception in 1977 this standard has been implemented by virtually all numerical processors.

Machine Numbers

Machine numbers are stored as a sequence of k + n bits (each of which is 0 or 1):

s e₁ ... e_k d₂ ... d_n

For single precision numbers we have n=24, k=8.

For double precision numbers we have n=53, k=11.

The sign is ``+'' for s=0 and ``-'' for s=1.

The exponent is obtained as e = (e₁ ... e_k )₂ - b where b = 2^k-1-2. The largest and smallest values of e are used to represent special values. Hence the smallest remaining value is e_min = 1 - b = 3 - 2^k-1, the largest remaining value is e_max = 2^k - 2 - b = 2^k-1.

For e_min <= e <= e_max we have: x = ±(.1d₂...d_n)₂ 2^e, representing normalized numbers
For e = e_min - 1 we have: x = ±(.0d₂...d_n)₂ 2^e_min, representing ±0 and subnormal numbers (aka denormalized numbers).
For e = e_max + 1 we have: x = ±Infinity if all d_j=0
x = NaN otherwise

Note: All numbers with sign "+", arranged by size from +0 up to +Infinity correspond to all the bit sequences (0 0...0 0...0) up to (0 1...1 0...0), arranged as binary integers. Therefore it is easy to compare two machine numbers, or to find the next smaller or larger machine number.

Rounding

Normally rounding ``to nearest'' is enabled. Let x_max be the largest machine number and x be an arbitrary real number.

For |x| > x_max: fl(x) = ±Infinity
otherwise: fl(x) is the nearest machine number. In the case of a tie the number with d_n=0 is chosen.

Other rounding modes are ``towards +Infinity'', ``towards -Infinity'', ``towards 0'' (chopping).

Machine Arithmetic

For addition, subtraction, multiplication, division and square roots of machine numbers the rounded exact result must be returned. E.g., adding two machine numbers x, y returns the machine number fl(x+y). For all combination of machine numbers (including ±0, ±Infinity, NaN) the result of the operation is well defined: E.g., 1/±0=±Infinity, Infinity+Infinity=Infinity, Infinity-Infinity=NaN, 0/0=NaN, 0*Infinity=NaN. Any arithmetic operation involving NaN returns NaN.

Note that there are two distinct machine numbers +0 and -0 which behave differently in expressions such as 1/0. However, IEEE 754 defines the comparison operator "==" such that +0==-0 is true. Note that NaN==NaN is defined as false.