Abstract
The ensemble Kalman filter (EnKF) is a recursive filter suitable for problems
with a large
number of variables , such as discretizations of partial differential equations
in geophysical
models. The EnKF originated as a version of the Kalman filter for large problems
(essentially,
the covariance matrix is replaced by the sample covariance), and it is now an
important data
assimilation comp onent of ensemble forecasting. EnKF is related to the particle
filter (in
this context, a particle is the same thing as an ensemble member) but the EnKF
makes the
assumption that all probability distributions involved are Gaussian. This
article briefly describes
the derivation and practical implementation of the basic version of EnKF, and
reviews several
extensions.
1 Introduction
The Ensemble Kalman Filter (EnKF) is a Monte-Carlo implementation of the
Bayesian update
problem: Given a probability density function (pdf) of the state of the modeled
system (the prior,
called often the forecast in geosciences) and the data likelihood, the Bayes
theorem is used to to
obtain pdf after the data likelihood has beed taken into account (the posterior,
often called the
analysis). This is called a Bayesian update. The Bayesian update is combined
with advancing
the model in time, incorporating new data from time to time. The original Kalman
Filter [18]
assumes that all pdfs are Gaussian (the Gaussian assumption) and provides
algebraic formulas for
the change of the mean and covariance by the Bayesian update, as well as a
formula for advancing
the covariance matrix in time provided the system is linear. However,
maintaining the covariance
matrix is not feasible computationally for high-dimensional systems. For this
reason, EnKFs were
developed [9, 16]. EnKFs re present the distribution of the system state
using a random sample,
called an ensemble, and replace the covariance matrix by the sample covariance
computed from
the ensemble. One advantage of EnKFs is that advancing the pdf in time is
achieved by simply
advancing each member of the ensemble. For a survey of EnKF and related data
assimilation
techniques, see [12].
2 A derivation of the EnKF
2.1 The Kalman Filter
Let us review first the Kalman filter. Let x denote the n-dimensional state
vector of a model, and
assume that it has Gaussian probability distribution with mean μ and covariance
Q, i.e., its pdf is

Here and below, / means proportional ; a pdf is always scaled so that its
integral over the whole
space is one. This probability distribution, called the prior, was evolved in
time by running the
model and now is to be updated to account for new data. It is natural to assume
that the error
distribution of the data is known; data have to come with an error estimate,
otherwise they are
meaningless. Here, the data d is assumed to have Gaussian pdf with covariance R
and mean Hx,
where H is the so-called the observation matrix. The covariance matrix R
describes the estimate
of the error of the data; if the random errors in the entries of the data vector
d are independent, R
is diagonal and its diagonal entries are the squares of the standard deviation
(“error size”) of the
error of the corresponding entries of the data vector d. The value Hx is what
the value of the data
would be for the state x in the absence of data errors. Then the probability
density p(d|x) of the
the data d conditional of the system state x, called the data likelihood, is

The pdf of the state and the data likelihood are combined to give the new
probability density
of the system state x conditional on the value of the data d (the posterior ) by
the Bayes theorem,

The data d is fixed once it is received, so denote the posterior state by
instead of x|d and the
posterior pdf by p (
). It can be shown by
algebraic manipulations [1] that the posterior pdf is also
Gaussian,

with the posterior mean
and covariance
given by the Kalman update formulas

where

is the so-called Kalman gain matrix.
2.2 The Ensemble Kalman Filter
The EnKF is a Monte Carlo approximation of the Kalman filter, which avoids
evolving the
covariance matrix of the pdf of the state vector x. Instead, the distribution is
represented by
a collection of realizations, called an ensemble. So, let

be an n × N matrix whose columns are a sample from the prior distribution.
The matrix X is
called the prior ensemble. Replicate the data d into an m × N matrix

so that each column di consists of the data vector d plus a random vector from
the n-dimensional
normal distribution N(0,R). Then the columns of

form a random sample from the posterior distribution. The EnKF is now
obtained [17] simply by
replacing the state covariance Q in Kalman gain matrix
by the sample
covariance C computed from the ensemble members (called the ensemble
covariance).
3 Theoretical analysis
It is commonly stated that the ensemble is a sample (that is, independent
identically distributed
random variables, i.i.d.) and its probability distribution is represented by the
mean and covariance,
thus assuming that the ensemble is normally distributed. Although the resulting
analyses, e.g., [7],
played an important role in the development of EnKF, both statements are false.
The ensemble
covariance is computed from all ensemble members together, which introduces
dependence, and
the EnKF formula is a non linear function of the ensemble, which destroys the
normality of the
ensemble distribution. For a mathematical proof of the convergence of the EnKF
in the limit for
large ensembles to the Kalman filer, see [23]. The core of the analysis
is a proof that large ensembles
in the EnKF are, in fact, nearly i.i.d. and nearly normal.
4 Implementation
4.1 Basic formulation
Here we follow [7, 10, 19]. Suppose the ensemble matrix X and the data
matrix D are as above.
The ensemble mean and the covariance are

where

and e denotes the matrix of all ones of the indicated size.
The posterior ensemble Xp is then given by

where the perturbed data matrix D is as above. It can be shown that the
posterior ensemble
consists of linear combinations of members of the prior ensemble.
Note that since R is a covariance matrix, it is always positive semidefinite
and usually
positive definite, so the inverse above exists and the formula can be
implemented by the Choleski
decomposition [19]. In [7, 10], R is replaced by the sample covariance DDT / (N −
1) and the inverse
is replaced by a pseudoinverse, computed using the Singular Values Decomposition
(SVD).
Since these formulas are matrix ope rations with dominant Level 3 operations [14],
they are
suitable for efficient implementation using software packages such as LAPACK (on
serial and
shared memory computers) and ScaLAPACK (on distributed memory computers) [19].
Instead
of computing the inverse of a matrix and multiplying by it , it is much better
(several times
cheaper and also more accurate) to compute the Choleski decomposition of the
matrix and treat
the multiplication by the inverse as solution of a linear system with many
simultaneous right -hand
sides [14].
4.2 Observation matrix-free implementation
It is usually inconvenient to construct and operate with the matrix H
explicitly; instead, a function
h(x) of the form

is more natural to compute. The function h is called the observation function
or, in the inverse
problems context, the forward operator. The value of h(x) is what the value of
the data would be
for the state x assuming the measurement is exact. Then [19, 22] the
posterior ensemble can be
rewritten as

where

and

with

Consequently, the ensemble update can be computed by evaluating the
observation function h on
each ensemble member once and the matrix H does not need to be known explicitly.
This formula
also holds [19] for an observation function h(x) = Hx + f with a fixed
offset f , which also does
not need to be known explicitly. The above formula has been commonly used for a
nonlinear
observation function h, such as the position of a hurricane vortex [8].
In that case, the observation
function is essentially approximated by a linear function from its values at the
ensemble members.
4.3 Implementation for a large number of data points
For a large number m of data points, the multiplication by P−1 becomes a
bottleneck. The following
alternative formula [19, 22] is advantageous when the number of data points m is
large (such as
when assimilating gridded or pixel data) and the data error covariance matrix R
is diagonal (which
is the case when the data errors are uncorrelated), or cheap to decompose (such
as banded due to
limited covariance distance). Using the Sherman-Morrison-Woodbury formula [15]

with

gives

which requires only the solution of systems with the
matrix R (assumed to be cheap) and of a
system of size N with m right-hand sides. See [19] for operation counts.
5 Further extensions
The EnKF version described here involves randomization of
data. For filters without randomization
of data, see [2, 11, 25].
Since the ensemble covariance is rank-deficient (there are
many more state variables, typically
millions, than the ensemble members, typically less than a hundred), it has
large terms for pairs of
points that are spatially distant. Since in reality the values of physical
fields at distant locations
are not that much correlated, the covariance matrix is tapered off artificially
based on the distance,
which results in better approximation of the covariance for small ensembles [13],
such as typically
used in practice. Further development of this idea gives rise to localized EnKF
algorithms [3, 24].
For problems with coherent features, such as firelines,
squall lines, and rain fronts, there is a need
to adjust the simulation state by distorting the state in space as well as by an
additive correction
to the state. The morphing EnKF [5, 20] employs intermediate states,
obtained by techniques
borrowed from image registration and morphing, instead of linear combinations of
states.
EnKFs rely on the Gaussian assumption, though they are of
course used in practice for nonlinear
problems, where the Gaussian assumption is not satisfied. Related filters
attempting to relax the
Gaussian assumption in EnKF while preserving its advantages include filters that
fit the state pdf
with multiple Gaussian kernels [4], filters that approximate the state
pdf by Gaussian mixtures [6],
a variant of the particle filter with computation of particle weights by density
estimation [20, 21],
and a variant of the particle filter with thick tailed pdfs to alleviate
particle filter degeneracy [26].