returns an array of time intervals on which the power exceeds the min
(background) power for a period longer than a min duration, with a minimum
peak value More explicitly. Time intervals are created by consider those
periods where the sound is above a given min threshold. However, since we
are dealing with voice, intervals which are too short in duration are
discarded. Likewise, we assume the speaker is not whispering, so portion of
the sound made by the speaker should be reasonably loud, thus intervals
which do not contain a point of a minimum power peak are discarded.
Intervals left over are retained. This algorithm was originally intend to
isolate phrases where there was a constant background hum (due to my
server). However, by varying minPower, phazes can be further split into
words and syllables, although not perfectly.
if found word, but did not find end, returns start, nil if found word, and
found end returns start, end if nothing found, returns nil, nil
This is used to further refine the segmentation technique given by
singleSeg. What it does is to make 2 passes, the first pass invoking
singleSeg isolates the intervals of speech from the background (there is
assume a some constant minor background noise). The second pass invokes
singleSeq upon each interval of speech obtained from the first pass, with a
higher power requirement. Thus drops in power between words and syllables
are detected, and we form intervals where the power has dropped. Within
each of those we search for the point where the power is at a minimum and
use that as a segementation boundry point.