There is no way to represent a number larger than the largest
our first example ( β = 10, t = 3; e ∈ [-4, +4]), there is no way to represent
There is no way to represent a positive number smaller than the smallest
positive floating point
number. In our first example, there is no way to represent 0.00001 (or a smaller
Absolute rounding error
We can calculate the difference between the true value we're trying to represent
the value of its floating-point representation. For example, the absolute error in
representation of e is |2.71 - 2.718281828 … | = |0.008281828 … |.
100 and 100.1 are closer than 1 and 1.1, even though the absolute difference is
both cases. Look at the size of the error in terms of the size of the value
Relative error. For x ≠ 0, the relative error between the approximate value x'
"real" value x is
For example, |1.1-1|/1.1| = 0.0909 ≈ 9%. However, |100.1-100|/|100| = 0.000999
Relative error in round-to-nearest
When we round numbers to represent them in a floating-point system, can we bound
relative error for positive numbers with no overflow or underflow?.
In general, a number of the form
. . . gets rounded either up or down to one of
The representation of the first number may be different (the +1 may cause a number
carries in the addition), but the value is the same. The difference between these
numbers is simply a 1 in the position occupied by
for a difference
Since we round to nearest, our error is at
most half this value.
The relative error can be calculated, since we use the convention that the
leading digit, d0,
is non-zero, so the smallest denominator (hence the largest bound) is when
a relative error of.
This matches our intuition that by increasing t (the number of digits) we get
In our example of a floating-point system with β = 2, t = 3, this gives a bound on
relative error of round-to-nearest of
This is also clear from the
or the 24 values in this representation.
Accumulation of error
Since we can't represent all real numbers exactly in a floating point
happens when we repeat operations . For example, take β = 10, t = 3, e ∈ [-2, +2],
consider the sum
100 + 0.1 + 0.1 + … + 0.1 [n times]
The first part of the sum, 100 + 0.1 is represented by 1.00
× 102. As we add
terms with value 0.1, the result is still 1.00 × 102, since our representation
that additional 0.1. The relative error can become arbitrarily large, given
n. If you play around with FloatExample.cumulativeError (on the web page), with
t1 = 1.0, n ≥ 10, and you'll see this demonstrated.
The easy way to work around this particular problem is to add the 0.1 terms
then add the 100 in other words, addition is not associative for floating-point
(a + b) + c ≠ a + (b + c).
Use the same floating-point system as in the previous
example to compute b2 - 4ac
b = 3.34, a = 1.22, and c = 2.28. The exact value is 0.0292 = 2.92
× 10-2, and
value is representable in our floating-point system. Look at how the value is
= 11.1556 ≈ 1.12 × 101
Compared to our exact answer of 2.92 ×
10-2, this has a relative error of
Subtracting two floating -point numbers that are very close together leaves very
digits a great deal of information is lost. Since the true value is very small,
round-o error becomes much more significant, and sometimes becomes much larger
the value being computed (see above).
The expression b2 - 4ac crops up in the solution to the quadratic equation
0. The general form of the solution, for the two roots
We may not have to worry about the large relative error in
b2 - 4ac, since it
may be small
in absolute value compared to -b. Here's a case where computing
lead to catastrophic cancellation above) gives a fairly acceptable value using
Compare this to the result if there were no error in the
computation of b2 - 4ac, which is.
. . . for a relative error of less than 5%.
There is a built-in problem with the formulas used above to compute the roots of
quadratic equation. if b2 - 4ac is close to b2, then there will be catastrophic
between -b and This is separate from the catastrophic cancellation
happen if b2 is close to 4ac.
Definition. a formula (or algorithm) is called "unstable" i errors in the input
get magnified during the computation (i.e., i the relative error in the final
answer can be
larger than the relative error in the input values).
Dealing with instability Our first example (100 + 0.1 +… ) was unstable, but there was an easy way to use a
different algorithm that is stable. perform the ope rations in a different order.
Our second example (b2 - 4ac) was also unstable, because of potential
cancellation, and there is, unfortunately, no easy x. You could increase the
number of significant digits to make the round-o error smaller, but you will still have the
for catastrophic cancellation when you subtract numbers that are very close,
your increased precision. The formula is unstable.
Our third example is also unstable (it includes the instability of example 2,
own instability), but the possibility of cancellation between - b and
avoid by changing the formula . Suppose b > 0 (otherwise swap the role of
if b < 0). Then avoid the subtraction in the numerator of the quadratic formula.
Since both - b and have the same (negative) sign, there will be no
cancellation. Now compute using
This formula involves no subtraction, so there is no catastrophic cancellation.
There are two ways, in general, to deal with unstable formulas or algorithms.
Increase the precision (the number of significant digits). This does not change
fact that the formula or algorithm is unstable, but can help minimize the
of the errors, for some inputs.
Use a different, stable, algorithm or formula to compute the result. When
this is preferred.
In the previous example we examined the error produced in calculating the roots
of ax2 +
bx+c. The numbers a, b, and c may come from measurement and already have some
associated with them. By using a stable algorithm we get a relatively correct
slightly incorrect inputs.
But, independently of the particular algorithm used to compute roots, what can
said about the effect that errors in the measurement of a, b, and c have on
values of roots?
That is, if the values of a, b, or c change slightly, could the roots change
To simply the discussion, let's look at the special case of a quadratic formula
x2 - c = 0
(so we're finding ).
Suppose that c = 0.25, but we use a bad approximation c' = 0.36 instead of the
value. Then our answer will be 0.6 instead of 0.5. The relative error in the
0.11/0.25 = 0.44, while the relative error in the result is 0.1/0.5 = 0.2.
Taking the square
root makes the relative error smaller!
This isn't something special about the particular case we chose. Let's work the
to see the general case of computing using an approximation c' of c. The
ratio of the
relative error of the result to the relative error of the input is (as suming c
and c ' are close
enough to have the same sign).
This ratio is always less than 1 (so the error improves), and when c' is very
close to c, the
ratio is close to 1/2. So the error is never increased, and as the measurements
is reduced to almost 1/2.
When computing the function f(x) using the approximation x' instead of the true
x, we say that the condition number is equal to the ratio of the relative error
result and the relative error of the input, i.e..
If you take the limit as x' → x (and assume that f is differentiable, and that
f(x) ≠ 0),
For , the condition number is (well, do the derivative)! Not all
have good condition numbers. Compute the condition number for cos(x), and
that you can a huge condition number by choosing an appropriate x.