Call Now: (800) 537-1660  
The Algebra Buster
The Algebra Buster


May 24th









May 24th

IEEE 754 Number Representation


As you can see in your textbook, the IEEE754 Floating Point re presentation is composed of three parts,
the Mantissa Sign, S, the Signed Exponent , E, and the Mantissa Magnitude, M. In single precision
floating point representation, the Signed Exponent, E, is 8 bits, whereas the Mantissa Magnitude, M,
is composed of the remaining 23 bits. In double precision floating point representation, the Signed
Exponent, E is 11 bits, whereas the Mantissa Magnitude, M, is composed of the remaining 52 bits.
In both cases, the hidden-1 representation for the Mantissa Magnitude holds, effectively extending its
representational power by one bit .

The value of a single precision IEEE754 Floating Point number is typically given by the following
formula :

Yet, one of the things to keep in mind is that this interpretation only holds for 0 < E < 255. For
E = 0 (i.e., E being the bit string “00000000”) and for E = 255 (i.e., E being the bit string “11111111”)
alternate value interpretations hold as given below.

Condition N value
E = 255 and M ≠ 0 NaN
E = 255 and M = 0
E = 0 and M≠ 0
E = 0 and M = 0

Similarly, the fol lowing interpretations hold for the case of double precision IEEE754 Floating Point
numbers:

Condition N value
E = 2047 and M ≠ 0 NaN
E = 2047 and M = 0
0 < E < 2047
E = 0 and M ≠ 0
E = 0 and M = 0

Adding and subtraction are the most difficult of the elementary operations for floating -point operands.
Here, we deal only with addition, since subtraction can be converted to addition by flipping the sign of
the subtrahend. Consider the addition:

As suming e 1 ≥ e2, we begin by aligning the two operands through right-shifting of the significand s2
of the number with the smaller exponent:

If the exponent base b and the number representation radix r are the same, we simply shift s2 to the
right by e1 − e2 digits. When b = ra the shift amount, which is computed through direct subtraction
of the biased exponents, is multiplied by a . In either case, this step is referred to as alignment shift, or
preshift (in contrast to normalization shift or postshift, which is needed when the resulting significands
is unnormalized). After the alignment shift, the significands of the two operands are added to get the
significand of the sum.

When the operand signs are a like , a single-digit normalizing shift is always enough. For example,
with IEEE754 format, we have 1 ≤ s < 4, which may have to be reduced by a factor of 2 through a
single-bit right shift (and adding 1 to the exponent to compensate). However, when the operands have
different signs , the resulting significand may be very close to 0 and left shifting by many positions may
be needed for normalization.

Figure 1 shows a floating-point addition example:

Figure 1: Floating-point addition

Figure 2 shows a floating-point subtraction example:

Figure 2: Floating-point subtraction

Floating-point multiplication is simpler than floating-point addition; it is performed by multiplying
the significands and adding the exponents:

Postshifting may be needed, since the product s1 ×s2 of the two significands can be unnormalized.
For example, with the IEEE format, we have 1 ≤ s1×s2 < 4, leading to the possible need for a single-bit
right shift. Also, the computed exponent needs adjustment if a normalization shift is performed.

Figure 3 shows a floating-point multiplication example:

Figure 3: Floating-point multiplication

Similarly, floating-point division is performed by dividing the significands and subtracting the ex-
ponents:

The ratio s 1/s2 of the significands may have to be normalized. With the IEEE754 format, we have
1/2 < s1/s2 < 2 and a single-bit left shift is always adequate. The computed exponent needs adjustment
if a normalizing shift is performed.

Prev Next
 
Home    Why Algebra Buster?    Guarantee    Testimonials    Ordering    FAQ    About Us
What's new?    Resources    Animated demo    Algebra lessons    Bibliography of     textbooks
 

Copyright © 2009, algebra-online.com. All rights reserved.