FormantDynamicTimeWarping


    Introduction

    FormantDynamicTimeWarping allows a graphical inspection of the effects of a dynamic time alignment of a pair of feature sequences. The feature sequences involved, are the frequences first and second formants.

    The user selects the sequences to be examined by loading them from a .fmt file previously made with the FormantArchiver application. For example, the feature sequences for the word "right" spoken quickly and slowly are displayed as:

    dtw1a.jpg

    Here the yellow curves represent the feature sequence of the short utterance of the word "right" and the blue curves represent the feature sequenc of the long utterance of "right"

    The user then selects from one of four choices: type1, type3 , type4 or Itakura.


    Type I

    When applying type 1 we get

    dtw1b.jpg

    What is happening is that the sequence is being "stretched out" to given the best possible match between the two sounds. At the beginning we see a light blue and a deep yellow. The light blue, indicates that the blue is values are held fixed, as time passes for the deep yellow. Then the roles reverse, time passes for the blue, indicated by a deep blue, while the yellow remains fixed, and is displayed as a pale yellow. These roles keep interchanging until we reach the end. In terms of dymamic programming, type I is representated by:

    type1.jpg
    In terms of the grid upon which we do our dynamic programming, we can crudely interpret this as saying the time it takes to go from the starting posistion to the ending posistion is the the number of rows plus the number of columns of the grid.


    Type III

    Type III is governed by
    type3.jpg
    In contrast to type I, neither curve is held fixed. At each step, we either move one to the right and one or two up or visa versa. Weighting gives us the city block measure of distance. Since each curve must move at least one forward at each step, using Type III we cannot align the previous two curves, since the length of one is more than twice the length of the other.

    So we must consider a different set of curves which are not too different in length. We take as our example two occurances of the word "right" which are said at about the same rate.

    dtw3a.jpg

    Then applying Type III to this we get

    dtw3b.jpg


    Itakura Type

    Itakura type always moves 1 to the right and 0, 1 or 2 up, with the additional constraint of disallowing to consecutive moves to the right (with no upward movement). Weighting on Itakura can be interpreted as distance is the horizontal distance, ie number of steps to the right. This makes it desirable as a rule of thumb to put the shorter sequencence as the horizontal sequence.

    typeItakura.jpg
    Applying Itakura dynamic time alignment we get

    dtwItakurab.jpg


    Type IV

    Type IV is similar to Itakura, except that we allow consecutive moves to the right.

    type4.jpg
    Applying Itakura dynamic time alignment we get

    dtw4b.jpg