As you can see in your textbook, the IEEE754 Floating Point re presentation is
composed of three parts,
the Mantissa Sign, S, the Signed Exponent , E, and the Mantissa Magnitude, M. In
single precision
floating point representation, the Signed Exponent, E, is 8 bits, whereas the
Mantissa Magnitude, M,
is composed of the remaining 23 bits. In double precision floating point
representation, the Signed
Exponent, E is 11 bits, whereas the Mantissa Magnitude, M, is composed of the
remaining 52 bits.
In both cases, the hidden-1 representation for the Mantissa Magnitude holds,
effectively extending its
representational power by one bit .
The value of a single precision IEEE754 Floating Point number is typically given
by the following
formula :

Yet, one of the things to keep in mind is that this
interpretation only holds for 0 < E < 255. For
E = 0 (i.e., E being the bit string “00000000”) and for E = 255 (i.e., E being
the bit string “11111111”)
alternate value interpretations hold as given below.
| Condition |
N value |
| E = 255 and M ≠ 0 |
NaN |
| E = 255 and M = 0 |
 |
| E = 0 and M≠ 0 |
 |
| E = 0 and M = 0 |
 |
Similarly, the fol lowing interpretations hold for the case
of double precision IEEE754 Floating Point
numbers:
| Condition |
N value |
| E = 2047 and M ≠ 0 |
NaN |
| E = 2047 and M = 0 |
 |
| 0 < E < 2047 |
 |
| E = 0 and M ≠ 0 |
 |
| E = 0 and M = 0 |
 |
Adding and subtraction are the most difficult of the
elementary operations for floating -point operands.
Here, we deal only with addition, since subtraction can be converted to addition
by flipping the sign of
the subtrahend. Consider the addition:

As suming e 1 ≥ e2, we begin by aligning the two operands
through right-shifting of the significand s2
of the number with the smaller exponent:

If the exponent base b and the number representation radix
r are the same, we simply shift s2 to the
right by e1 − e2 digits. When b = ra the shift amount, which is
computed through direct subtraction
of the biased exponents, is multiplied by a . In either case, this step is
referred to as alignment shift, or
preshift (in contrast to normalization shift or postshift, which is needed when
the resulting significands
is unnormalized). After the alignment shift, the significands of the two
operands are added to get the
significand of the sum.
When the operand signs are a like , a single-digit normalizing shift is always
enough. For example,
with IEEE754 format, we have 1 ≤ s < 4, which may have to be reduced by a factor
of 2 through a
single-bit right shift (and adding 1 to the exponent to compensate). However,
when the operands have
different signs , the resulting significand may be very close to 0 and left
shifting by many positions may
be needed for normalization.
Figure 1 shows a floating-point addition example:

Figure 1: Floating-point addition
Figure 2 shows a floating-point subtraction example:

Figure 2: Floating-point subtraction
Floating-point multiplication is simpler than floating-point addition; it is
performed by multiplying
the significands and adding the exponents:

Postshifting may be needed, since the product s1 ×s2 of
the two significands can be unnormalized.
For example, with the IEEE format, we have 1 ≤ s1×s2 < 4, leading to the
possible need for a single-bit
right shift. Also, the computed exponent needs adjustment if a normalization
shift is performed.
Figure 3 shows a floating-point multiplication example:

Figure 3: Floating-point multiplication
Similarly, floating-point division is performed by dividing the significands and
subtracting the ex-
ponents:

The ratio s 1/s2 of the significands may have to be
normalized. With the IEEE754 format, we have
1/2 < s1/s2 < 2 and a single-bit left shift is always adequate. The computed
exponent needs adjustment
if a normalizing shift is performed.