FormantDynamicTimeWarping
Introduction
FormantDynamicTimeWarping allows a graphical inspection of the
effects of a dynamic time alignment of a pair of feature sequences. The
feature sequences involved, are the frequences first and second formants.
The user selects the sequences to be examined by loading them from a .fmt file
previously made with the
FormantArchiver
application. For example, the feature sequences for the word "right" spoken quickly and
slowly are displayed as:
Here the yellow curves represent the feature sequence of the short utterance of the word "right" and
the blue curves represent the feature sequenc of the long utterance of "right"
The user then selects from one of four choices: type1, type3 , type4 or Itakura.
Type I
When applying type 1 we get
What is happening is that the sequence is being "stretched out" to given the best possible match between the two sounds.
At the beginning we see a light blue and a deep yellow.
The light blue, indicates that the blue is values are held fixed, as time passes for the deep yellow.
Then the roles reverse, time passes for the blue, indicated by a deep blue, while the yellow remains
fixed, and is displayed as a pale yellow. These roles keep interchanging until we reach the end.
In terms of dymamic programming, type I is representated by:
In terms of the grid upon which we do our dynamic programming,
we can crudely interpret this as saying the time it takes to go from the
starting posistion to the ending posistion is
the the number of rows plus the number of columns of the grid.
Type III
Type III is governed by
In contrast to type I, neither curve is held fixed. At each step, we either move one to the right and one or two
up or visa versa. Weighting gives us the city block measure of distance.
Since each curve must move at least one forward at each step, using
Type III we cannot align the previous two curves, since the length of one
is more than twice the length of the other.
So we must consider a different set of curves which are not too different in length.
We take as our example two occurances of
the word "right" which are said at about the same rate.
Then applying Type III to this we get
Itakura Type
Itakura type always moves 1 to the right and 0, 1 or 2 up, with the additional constraint of disallowing to
consecutive moves to the right (with no upward movement). Weighting on Itakura can be interpreted as distance is the
horizontal distance, ie number of steps to the right. This makes it desirable as a rule of thumb to put the shorter sequencence as
the horizontal sequence.
Applying Itakura dynamic time alignment we get
Type IV
Type IV is similar to Itakura, except that we allow consecutive moves to the right.
Applying Itakura dynamic time alignment we get