• Objective:
Let f : S
En → R and f ∈C3(S).
Find the minimum (infimum) of f on S.
• Ideas:
1. Given xk ∈ S, approximate f locally by a
(convex) quadratic function .
2. Minimize the approximate function
exactly.

• Implementation:
1. Taylor theorem says

2. Minimize the r.h.s. =>

• Observations:
1. If f is a strictly convex quadratic function,
then xk+1 = x*.
2. If F(x*) is positive definite at a local min-
imum x*, a nonquadratic function f can be
well approximated by a convex quadratic
function in a neighborhood of x*.
3. If f ∈ C3 and F(x*) is positive definite
at a local minimum x*, then the order of
convergence of Newton's method is at least
two, when the algorithm starts sufficiently
close to x*.
• Theorem (Newton's Method):
Let f ∈ C3(En). As sume that at a local
minimum point x*, H(x*) is positive definite.
Then the points generated by Newton's
Method converges x* with an order of
convergence at least two , if x0 is sufficiently
close to x*.
• Observations:
1. The proof is similar to the one-dimensional
case.
2. Local convergence, but converges
quadratically.
3. Heavily depends on the positive definiteness
of the Hessian Matrix.
• Practice of Newton's Method:
1. To safe guard the nonquadratic terms in f ,
we may consider a modified Newton' s step
by setting

with a 1-dimensional search on α.
2. To safe guard the nonpositive definiteness
or even singularity of F(x*), we may cons-
ider

where I is the identity matrix and
≥ 0.
In this case, we have Newton's method as
is small enough, and gradient method as
is large enough.
Conjugate Direction Method
• Motivation:
- Gradient method converges globally, but
linearly, with the first order information
required.
- Newton's method converges quadratically,
but locally, with the second-order informa-
tion required. In particular, the inversion
of the Hessian takes computational efforts.
- Can we find something intermediate?
- Let us start with something we know to
explore one of the "great creativity" in
non Systems -of-linear-equations-3.html">linear programming .
• Key Document:
- M. R. Hestense and E.L. Stiefel (1952),
"Methods of Conjugate Gradients for
Solving Linear Systems",
J. Res. Nat. Bur. Standards, Section B,
49, 409-436.
- Roger Fletcher and Colin Reeves (1964),
"Function Minimization by Conjugate
Gradients", Computer J., 7, 149-154.
- Roger Fletcher, BS 1960, Cambridge
University, Theoretical Physics. Ph.D
1963, Leeds University, supervised by
Colin Reeves on computing molecules
wave functions.
• Strictly Convex Quadratic Case:

where Q is symmetric positive definite.
It is known that the unique minimum of
f(x) is given by

• Observations:

3. d0 can be set as -
4. To avoid inverting Q, d1 better be
related to -
and d0 through Q.
• Definition (Conjugate Direction):
Let Q be a symmetric n × n matrix and
d1, d2 ∈ En be two vectors . Then d1 and d2
are conjugate with respect to Q
(Q-orthogonal), if (d1)TQd2 = 0.
• Example:
1. For Q = 0, any two vectors are conjugate
directions.
2. For Q = I, two conjugate directions are
orthogonal to each other.
3. For
is
Q-orthogonal to any d2 ∈ E2.
4. When Q is symmetric positive definite,
Q = LLT (Cholesky) and (d1)TQd2 =

• Theorem:
Let Q be positive definite and 
be non zero Q -orthogonal vectors. Then
are linearly independent .
• Proof:
Assume that
s.t.

Multiplying by Q and taking product with
d i yields

The positive definiteness of Q requires that

Hence
= 0.