next up previous contents
Next: Training the DHDM Up: No Title Previous: Hidden Space Dynamics using

Variable Time-Constant Low-Pass Filters using Kalman Filter Algorithms

What is a Kalman Filter?

A Kalman Filter (KF) computes the parameters of posterior probability distributions for certain kinds of stochastic process. Such processes are characterized by linear transformations and additive Gaussian `noise'. KFs generalize the kinds of linear filters familiar in signal processing.

We can think of such processes as hidden Markov models in which the hidden state and the observations are continuous random variables. This changes the nature of the computations: Because of the linearity and the Gaussian nature of the randomness, all the posteriors are Gaussian too, so instead of dealing with explicit probability distributions over a finite state space we deal with means and covariances.

In our acoustic-phonetic model we use a simple KF as a fancy kind of smoother with a variable time-constant. The full KF is a multi-dimensional system, but we only need one-dimension at a time. The stochastic process that generates our smoothing filter has a scalar state (xi) which evolves as a simple Gaussian random walk with a variance of 1. The observations (ti) are produced by adding another zero-mean Gaussian sample to the state. The variance of this observation noise is time-varying in a known way (pi).

xi+1 = xi + vi ti = xi + wi     (10.1)

where vi is zero-mean with variance ri=1, and wi is zero-mean with variance pi.

The KF equations for estimating the state of this hypothetical generating process are as follows: The estimated mean and variance of the state at frame i conditioned on observations up to and including frame i-1 is

$\displaystyle m_{i+1} = \frac{m_ip_i+t_iq_i}{p_i+q_i}
q_{i+1} = \frac{p_iq_i}{p_i+q_i} + r_i$     (10.2)

Taken together with the reverse-time versions, these equations solve the problem of the equilibrium state of the spring model of acoustic-phonetic dynamics.

In our dynamic phonetic state generator, the sequence of target values is treated as the observations ti, and we also have a pliancy pi associated with each target value.

Figure E.1(a) shows a sequence of phonetic targets and associated `standard deviations' (square roots of the pliancies).

A forward KF pass computes a mean and variance at each frame, conditioned on the observations so far (Figure E.1(b)) and a backward KF pass considers only the future (Figure E.1(c)). To obtain a symmetrical smoother we combine the two estimates (Figure E.1(d)).

  \begin{figure}% latex2html id marker 602
..., (c) backward through time, and
(d) after symmetrical filtering.}

(It is important to understand that we are not claiming that the target sequence is generated by the `model process' that the KF corresponds to. We do not even claim that the dynamic phonetic state construction process is a model of the actual speech pattern generation process.)

Figure E.3 shows how the Kalman filter propagates the posterior distribution for the current frame forward to form a prior distribution for the state of the system at the next frame. This prior distribution can be combined with the observation distribution to then form the posterior at the next frame, given all of the observations seen up until this point.

Figure E.2 shows an example of using a prior and observation distribution to obtain a posterior. The prior here specifies the estimated state value to a much greater precision than the observation distribution (shown by the wide bell-shaped curve) in this example.

Figure E.3 shows how the posterior distribution evolves over time given a sequence of 4 observations, all with the same error distribution (solid line). The mean value of this posterior gradually evolves towards the mean of the observation distributions. Figure E.1(b) also shows how these distributions change with incremental observations. The distribution is represented here by the mean, and the plus and minus one standard deviation points.

Figure E.2: Posterior state distribution after conditioning an observation by a prior.

Figure E.3: Consecutive prior and posterior distributions for a sequence of identical observations.

Figure E.5 shows how forward and backward Kalman filter passes through the data can be combined to give the best estimate at time t using the evidence provided from all of the data points (and not only those prior, or after time t). First of all, we use the Kalman filter maths to obtain the best estimates at each point using all of the data prior to time t (the forward pass), and then the same in the reverse direction, obtaining the best estimates given all the data following time t.

Then, thanks to these forward and backward recursions, at each time t we have a mean and variance for the estimated position:

All three of these estimates can be combined easily as (under the generation assumptions) they are all Gaussian. This then gives the best estimate given all of the data points.

All the gruesome mathematical details can be found in Appendix F.

To return to our springs and beads view of Appendix D, the forward pass calculates the mean position for bead i if we cut the spring connecting it to bead i+1, and the `variance' propagated forward in the same recursion represents the `springiness' of bead i given the network on the left (i.e. if you tried to move it, how much it would oppose its displacement). This is a kind of `equivalent circuit' for the spring network, as we could replace the whole network with one spring attached to one position. If we followed the resistor network analogy given in Appendix D, then this would be a Thevenin equivalent circuit.

The same can be done for the spring network to the right of the bead (by working backwards), and so the whole network can be reduced to a network of just one bead attached to three springs (which are themselves attached to three positions):

Figure E.4: Spring `equivalent circuit' at time i.

Figure E.5 shows how this has been done. Means and variances are propagated from the left and the right to the position in question. This defines two `prior' distributions (shown dotted on the right of Figure E.5), and these when combined with the observation distribution (shown solid) give the posterior distribution (shown dashed) at this point.

Figure E.5: Combining forward and backward priors in the two pass kalman smoother.

next up previous contents
Next: Training the DHDM Up: No Title Previous: Hidden Space Dynamics using
Hywel Richards