Variable Time-Constant Low-Pass Filters using Kalman Filter Algorithms

What is a Kalman Filter?

A Kalman Filter (KF) computes the parameters of posterior probability distributions for certain kinds of stochastic process. Such processes are characterized by linear transformations and additive Gaussian `noise'. KFs generalize the kinds of linear filters familiar in signal processing.

We can think of such processes as hidden Markov models in which the hidden state and the observations are continuous random variables. This changes the nature of the computations: Because of the linearity and the Gaussian nature of the randomness, all the posteriors are Gaussian too, so instead of dealing with explicit probability distributions over a finite state space we deal with means and covariances.

In our acoustic-phonetic model we use a simple KF as a fancy kind of
smoother with a variable time-constant. The full KF is a multi-dimensional
system, but we only need one-dimension at a time. The stochastic process
that generates our smoothing filter has a scalar state (*x*_{i}) which evolves as
a simple Gaussian random walk with a variance of 1. The observations (*t*_{i})
are produced by adding another zero-mean Gaussian sample to the state. The
variance of this observation noise is time-varying in a known way (*p*_{i}).

x_{i+1} = x_{i} + v_{i}
t_{i} = x_{i} + w_{i} |
(10.1) |

where *v*_{i} is zero-mean with variance *r*_{i}=1, and *w*_{i} is
zero-mean with variance *p*_{i}.

The KF equations for estimating the state of this hypothetical generating
process are as follows:
The estimated mean and variance of the state at frame *i* conditioned on
observations up to and including frame *i*-1 is

(10.2) |

Taken together with the reverse-time versions, these equations solve the problem of the equilibrium state of the spring model of acoustic-phonetic dynamics.

In our dynamic phonetic state generator, the sequence of target values is
treated as the observations *t*_{i}, and we also have a pliancy *p*_{i} associated
with each target value.

Figure E.1(a) shows a sequence of phonetic targets and associated `standard deviations' (square roots of the pliancies).

A forward KF pass computes a mean and variance at each frame, conditioned on the observations so far (Figure E.1(b)) and a backward KF pass considers only the future (Figure E.1(c)). To obtain a symmetrical smoother we combine the two estimates (Figure E.1(d)).

(It is important to understand that we are not claiming that the target sequence is generated by the `model process' that the KF corresponds to. We do not even claim that the dynamic phonetic state construction process is a model of the actual speech pattern generation process.)

Figure E.3 shows how the Kalman filter propagates the posterior distribution for the current frame forward to form a prior distribution for the state of the system at the next frame. This prior distribution can be combined with the observation distribution to then form the posterior at the next frame, given all of the observations seen up until this point.

Figure E.2 shows an example of using a prior and observation distribution to obtain a posterior. The prior here specifies the estimated state value to a much greater precision than the observation distribution (shown by the wide bell-shaped curve) in this example.

Figure E.3 shows how the posterior distribution evolves over time given a sequence of 4 observations, all with the same error distribution (solid line). The mean value of this posterior gradually evolves towards the mean of the observation distributions. Figure E.1(b) also shows how these distributions change with incremental observations. The distribution is represented here by the mean, and the plus and minus one standard deviation points.

Figure E.5 shows how forward and backward Kalman filter passes through the data can be combined to give the best estimate at time t using the evidence provided from all of the data points (and not only those prior, or after time t). First of all, we use the Kalman filter maths to obtain the best estimates at each point using all of the data prior to time t (the forward pass), and then the same in the reverse direction, obtaining the best estimates given all the data following time t.

Then, thanks to these forward and backward recursions, at each time t we have a mean and variance for the estimated position:

- given all the data prior to time t
- given all the data after time t
- given the observation at time t alone (simply the observation mean and variance).

All three of these estimates can be combined easily as (under the generation assumptions) they are all Gaussian. This then gives the best estimate given all of the data points.

All the gruesome mathematical details can be found in Appendix F.

To return to our springs and beads view of Appendix D, the forward
pass calculates the mean position for bead *i* if we cut the spring connecting
it to bead *i*+1, and the `variance' propagated forward in the same recursion
represents the `springiness' of bead *i* given the network on the left (i.e.
if you tried to move it, how much it would oppose its displacement). This is
a kind of `equivalent circuit' for the spring network, as we could replace
the whole network with one spring attached to one position. If we followed
the resistor network analogy given in Appendix D,
then this would be a Thevenin equivalent circuit.

The same can be done for the spring network to the right of the bead (by working backwards), and so the whole network can be reduced to a network of just one bead attached to three springs (which are themselves attached to three positions):

- one spring and position equivalent to the spring network to the left of the bead,
- one spring and position equivalent to the spring network to the right of the bead, and
- one spring attached to the target position at time t.

Figure E.5 shows how this has been done. Means and variances are propagated from the left and the right to the position in question. This defines two `prior' distributions (shown dotted on the right of Figure E.5), and these when combined with the observation distribution (shown solid) give the posterior distribution (shown dashed) at this point.