Class
Sound
In: rbSnack/tkSnack.rb
Parent: Object

A sound object consists of the sound and the methods to play, record, analyse and edit.

Methods

+, [], []=, cget, changed, concatenate, configure, copy, crop, cut, dBPowerSpectrum, destroy, filter, flush, formant, get_sample, info, insert, length, length=, load, max, min, new, pause, pitch, play, power, read, record, reverse, section_to_canvas, set_sample, spectrogram_to_canvas, stop, waveform_to_canvas, write,
Attributes

 [R]  :name

Options is hash of constructior options

Included modules

Snack
Public Class methods
new( options=nil, &op_block) src

Creates a new Sound object. Valid options are Valid options are

load fileName
reads file filename into self
file fileName
links on-disk file to sound
channel channelName
links sound to channel. Used for audio streaming or playing large files. Audio data not loaded into memory.
rate sampleRate
specifies sample rate, a positive integer which should supported by hardware. For speech applicatons 16000 is most common.
channels channelSpec
Valid values are mono, stero, and integers greater than 0. The default is 'mono'.
encoding sampleFormat
The sample encoding format: legal values for sampleFormat are _Lin16_, _Lin8offset_, _Lin8_, _Lin24_, _Lin32_, Float, Alaw, or Mulaw. Default encoding is _Lin16_.
skiphead noOfBytes
Used to skip used an unknown file header of size noOfBytes bytes
byteorder endianess
Valid values are littleEndian and bigEndian
guessproperties boolean
Valid values are 1, 0, or the ruby true and false. Specifies that Snack should try to infer properties, such as byte order, sample encoding format, and sample rate for a raw file by analyzing the contents of the file.
precision precision
Specifies the precison of the buffer data. Valid values are single and double.
buffersize size
Specifies the internal buffer size used for samples obtained through a channel.
fileformat RAW
Sets the file format. Legitimate formats currently supported are WAV, MP3, AU, SND, AIFF, SD, SMP, CSL, and RAW binary. The command returns the file format detected. It is possible to force a file to be read as RAW using fileformat RAW. In this case the properties of the sound data should be specified by hand
changecommand cmd
Specifies a procedure to be called whenever a sound property is changed
debug level
Controls the level of debugging. Valid values are 0 through 5.
Public Instance methods
cget(option=nil, &op_block) src

Retrieve the current value of a given sound option. Permissible values for option argument are: Valid options are

load
returns the name of the file which was loaded into self
file
returns file linked to sound
channel
lreturns channel which sound is linked to (Used for audio streaming or playing large files. Audio data not loaded into memory.)
load fileName
reads file filename into self
file fileName
links on-disk file to sound
channel channelName
links sound to channel. Used for audio streaming or playing large files. Audio data not loaded into memory.
rate sampleRate
specifies sample rate, a positive integer which should supported by hardware. For speech applicatons 16000 is most common.
channels channelSpec
Valid values are mono, stero, and integers greater than 0. The default is 'mono'.
encoding sampleFormat
The sample encoding format: legal values for sampleFormat are _Lin16_, _Lin8offset_, _Lin8_, _Lin24_, _Lin32_, Float, Alaw, or Mulaw. Default encoding is _Lin16_.
skiphead noOfBytes
Used to skip used an unknown file header of size noOfBytes bytes
byteorder endianess
Valid values are littleEndian and bigEndian
guessproperties boolean
Valid values are 1, 0, or the ruby true and false. Specifies that Snack should try to infer properties, such as byte order, sample encoding format, and sample rate for a raw file by analyzing the contents of the file.
precision precision
Specifies the precison of the buffer data. Valid values are single and double.
buffersize size
Specifies the internal buffer size used for samples obtained through a channel.
precision
Returns the precison of the buffer data. Valid values are single and double.
buffersize size
Returns the internal buffer size used for samples obtained through a channel.

Note: At most one value at at time can be specified. The value may be specified either as a single string or as a block.

*Example 1*
cget("precision")
*Example 2*
cget(){ channels; }
changed(notificationMessage='new') src

In the rare case that Snack needs to be forcefully notified that a sound has changed invoke the change method on the sound object that has changed. Permissible values for notificationMessage are

concatenate(trailingsound) src

Concatenates the sample data from trailingsound to the end of self. Both sounds must be of the same format (type). Self is now the join of both data samples

configure(options=nil, &op_block) src

To configure (or reconfigure) a sound use configure Valid options are

load fileName
reads file filename into self
file fileName
links on-disk file to sound
channel channelName
links sound to channel. Used for audio streaming or playing large files. Audio data not loaded into memory.
load fileName
reads file filename into self
file fileName
links on-disk file to sound
channel channelName
links sound to channel. Used for audio streaming or playing large files. Audio data not loaded into memory.
rate sampleRate
specifies sample rate, a positive integer which should supported by hardware. For speech applicatons 16000 is most common.
channels channelSpec
Valid values are mono, stero, and integers greater than 0. The default is 'mono'.
encoding sampleFormat
The sample encoding format: legal values for sampleFormat are _Lin16_, _Lin8offset_, _Lin8_, _Lin24_, _Lin32_, Float, Alaw, or Mulaw. Default encoding is _Lin16_.
skiphead noOfBytes
Used to skip used an unknown file header of size noOfBytes bytes
byteorder endianess
Valid values are littleEndian and bigEndian
guessproperties boolean
Valid values are 1, 0, or the ruby true and false. Specifies that Snack should try to infer properties, such as byte order, sample encoding format, and sample rate for a raw file by analyzing the contents of the file.
precision precision
Specifies the precison of the buffer data. Valid values are single and double.
buffersize size
Specifies the internal buffer size used for samples obtained through a channel.
precision precision
Specifies the precison of the buffer data. Valid values are single and double.
debug level
Controls the level of debugging. Valid values are 0 through 5.
copy(source_sound, options=nil, &op_block) src

Copies sample data from source_sound into self, destroying original data. Valid options are

start startPos
Specifies the starting position of the section to be copied.
stop endPos
Specifies the ending (stoping) position of the section to be copied.
crop( startPos=1, endPos=nil, options=nil, &op_block) src

Removes all sample data outside of the range [startPos..endPos]. Self now consists of sample data inside the range [startPos..endPos] If endPos is nil, end of data sample is used in place of endPos

cut( startPos=1, endPos=nil, options=nil, &op_block) src

Removes all sample data inside the range [startPos..endPos]. This is the opposite of crop

destroy() src

Removes and frees the storage for this sound. associated with it.

dBPowerSpectrum(options=nil, &op_block) src

Computes the log FFT power spectrum of the sound (at the time given by the start option) and returns a list of dB values. Permissible options include

start position
Sets the starting positon of the sample
stop postion
Sets the ending (stoping) positon of the sample
fftlength length
Sets the length of the Fast Fourier transform. It is the N in

S(k)  =  N-1
å
t=0 
s(t)expæ
ç
è
 -  2pikt
N
ö
÷
ø

Permissible values are 8, 16, 32, 64, 128, 256, 512, 1024, 2048, or 4096. The default is 512.
windowlength length
Sets the length of window. See Windowing
windowtype type
Sets the type of window. Permissible types are Hamming, Hanning, Bartlett, Blackman, or Rectangle.
skip pointsPerStep
which specifies how many points to move the FFT window forward at each step.
channel channelNumber
Selects which channle to use for multichannel sounds
preemphasisfactor factor
Sets the amount of preemphasis applied to the signal prior to the FFT calculation (default 0.0).
analysistype type
Sets the analysis type. Valid values are either FFT or LPC. FFT is the default.
lpcorder order
Sets the order, N, (number of poles) when using Linear Predictive Code analysis. In particular, this assumes within a given frame the speech signal s(t) is determined by constands b0...bN by

s(t)  =  N
å
i=0 
bi  ´  s(t  -  i)

filter( givenFilter, options=nil, &op_block) src

Applies the givenFilter to self. ie. Filters self by givenFilter. Note: the givenFilter must be of type Filter Valid options are

start position
Sets the sample starting position where we begin to apply the filter
stop postion
Setssample stoping position where we cease to apply the filter
continuedrain bool
Specifies whether to use.. boolean
progress boolean
Progress callback. Acceptable values are true or false (Also _"show"_ or _"hide"_)
flush() src

Removes all audio data from self, ie empties self.

formant( options=nil, &op_block) src

Estimates speech formant trajectories. Dynamic programming is used to optimize trajectory estimates by imposing frequency continuity constraints. The formant frequencies are selected from candidates proposed by solving for the roots of the linear predictor polynomial computed periodically. The local costs of all possible mappings of the complex roots to formant frequencies are computed at each frame based on the frequencies and bandwidths of the component formants for each mapping. The cost of connecting each of these mappings with each of the mappings in the previous frame is then minimized using a modified Viterbi algorithm.

Returns an array of frames, each frame consists of a pair of arrays: The first member of the pair consists of an array of formant values, lowest to highest; the second member of the pair consist of of an array of formant band widths. Thus the return value looks like

[Frame1,Frame2,...]

where each Framei looks like

[[F1,F2,...][B1,B2,...]]

where the Fi's are formant means and the Bi's are the bandwidths. The first frame corresponds to a start time of half the window length. The most recent frame is the last frame.

Valid options are

start position
Sets the sample starting position where we begin to apply the filter
stop postion
Sets sample stoping position where we cease to apply the filter
framelength t
Specifies the intervals between the values (default 0.01).
numformants n
Controls how many formants to calculate (default 4, maximum 7).
windowlength length
Specifies the size of the window in seconds (default 0.049).
preemphasisfactor factor
Specify the amount of preemphasis applied to the signal prior to windowing (default 0.7).
windowtype type
Select windowing function Valid values are Cos^4 (default), Hamming, Hanning, and Rectangle)
lpctype t
Valid values are either 0 (autocorrelation) or 1 (stabilized covariance).
lpcorder n
Specifies the order of the LPC Analysis (default is 12).
progress boolean
Progress callback. Acceptable values are true or false (Also _"show"_ or _"hide"_)
ds_freq f
Specifies the sampling rate of the data to be used in the formant frequency analysis (default 10000).
nom_ff_freq f
Specifies the nominal value of the first formant frequency. This value is used to adjust the nominal values of all other formants and of the ranges over which the formants are permitted to exist. The default value of 500Hz assumes that the vocal tract length is 17 cm and that the speed of sound is 34000 cm/sec. Nominal F1 values scale directly with sound velocity and inversely with vocal-tract length.
get_sample(index) src

Gets sample value at index

info() src

Returns a list with information about the sound. The entries are [length, rate, max, min, encoding, channels, fileFormat, headerSize]

insert( sound, position, startPos, stopPos) src

insert applies only to in-memory sounds. It is used to insert into the orignal sound (self) another sound (sound) at the given position. Moreover, a range of the samples from sound to be inserted can be specified by specifing startPos and stopPos

length=(n, units=nil) src

Sets length of sound in number of seconds or samples (default) To set in terms of seconds use sound.length=(n, 'seconds')

length(units=nil) src

Gets the length of this sample in terms of number of seconds or number of samples(default). To get in terms of seconds uses len=sound.length('seconds')

load(filename, options=nil, &op_block) src

Loads new sound data from a file. Synonym for "read". Valid options are

rate sampleRate
specifies sample rate, a positive integer which should supported by hardware. For speech applicatons 16000 is most common.
channels channelSpec
Valid values are mono, stero, and integers greater than 0. The default is 'mono'.
encoding sampleFormat
The sample encoding format: legal values for sampleFormat are _Lin16_, _Lin8offset_, _Lin8_, _Lin24_, _Lin32_, Float, Alaw, or Mulaw. Default encoding is _Lin16_.
skiphead noOfBytes
Used to skip used an unknown file header of size noOfBytes bytes
byteorder endianess
Valid values are littleEndian and bigEndian
guessproperties boolean
Valid values are 1, 0, or the ruby true and false. Specifies that Snack should try to infer properties, such as byte order, sample encoding format, and sample rate for a raw file by analyzing the contents of the file.
start startPos
Starting position of data to be read
stop endPos
Stoping position of data to be read
fileformat RAW
Sets the file format. Legatimate formats are Current supported file formats are WAV, MP3, AU, SND, AIFF, SD, SMP, CSL, and RAW binary. The command returns the file format detected.
progress boolean
Progress callback. Acceptable values are true or false (Also _"show"_ or _"hide"_)

It is possible to force a file to be read as RAW using fileformat RAW. In this case the properties of the sound data should be specified by hand

max(options=nil, &op_block) src

Returns the largest positive sample value of the sound.

min(options=nil, &op_block) src

Returns the largest negative sample value of the sound.

pause() src

Pause current record/play operation. Next pause invocation resumes play/record.

pitch(options=nil, &op_block) src

Returns a list of pitch values, spaced 10ms, computed using the AMDF method. Valid options are

start startPos
Starting position of sample range
stop endPos
Stoping position of sample range
maxpitch maxpitch
Maximum pitch (default is 400)
minpitch minpitch
Minimum pitch (default is 60)
play(options=nil, &op_block) src

Plays the sound. Valid options are

start startPos
Determines the starting position of segment to be played
stop endPos
Determines the stoping position of segment to be played
output outputJack
Determines the output jack
blocking booleanValue
False means asynchronous playback, true means the the function doesnot return until playback is completed.
command callback
The callback to be excuted at the conclusion of the playback.
device outputDevice
Specifies audio output device.
filter filter
Used to filter sound during playback.
devicerate freq
Used to specify alternate device rate.Rarely used.
devicechannels n
Used to specify alternate device channel. Rarely used.
power(options=nil, &op_block) src

Returns a list of windowed of log-power values Valid options are

start startPos
Specifies the starting position of the sample to be examined.
stop endPos
Specifies the stoping position of the sample to be examined
framelength length
Specifies interval length between values
windowtype type
Specifies the type of windows to be used. Permissible types are Hamming, Hanning, Bartlett, Blackman, or Rectangle.
windowlength length
Specifies the number of points in each window.
preemphasisfactor factor
Specifies the preemphasisfactor applied to signal prior to windowing. This
read( filename, options=nil, &op_block) src

Reads new sound data from a file. Synonym for "load". Valid options are

rate sampleRate
specifies sample rate, a positive integer which should supported by hardware. For speech applicatons 16000 is most common.
channels channelSpec
Valid values are mono, stero, and integers greater than 0. The default is 'mono'.
encoding sampleFormat
The sample encoding format: legal values for sampleFormat are _Lin16_, _Lin8offset_, _Lin8_, _Lin24_, _Lin32_, Float, Alaw, or Mulaw. Default encoding is _Lin16_.
skiphead noOfBytes
Used to skip used an unknown file header of size noOfBytes bytes
byteorder endianess
Valid values are littleEndian and bigEndian
guessproperties boolean
Valid values are 1, 0, or the ruby true and false. Specifies that Snack should try to infer properties, such as byte order, sample encoding format, and sample rate for a raw file by analyzing the contents of the file.
fileformat RAW
Sets the file format. Legatimate formats are Current supported file formats are WAV, MP3, AU, SND, AIFF, SD, SMP, CSL, and RAW binary. The command returns the file format detected.

It is possible to force a file to be read as RAW using fileformat RAW. In this case the properties of the sound data should be specified by hand

start startPos
Starting position of data to be read
stop endPos
Stoping position of data to be read
progress boolean
Progress callback. Acceptable values are true or false (Also _"show"_ or _"hide"_)

[Example] file = Snack::getOpenFile

sound.read(file){progress true}

record(options=nil, &op_block) src

Starts recording data from the audio device into the sound object. Valid options are

input jack
Specifies the input jack
device device
Specifies the input device
append boolean
true means append to sound. Applies only to in memory recordings.
fileformat format
Use when writing data to a channel or possibly file.
reverse(options=nil, &op_block) src

Reverses a sound of self Valid options are

start startPos
Specifies starting position of data sample to be reversed.
stop endPos
Specifies stoping position of data sample.
section_to_canvas( canvas, x, y, options=nil, &op_block) src

Displays a FFT log power spectrum section of this sound onto the canvas The first argument, canvas, specifes the canvas to which the section of this sound will be attached. The integers x and y are the upper lefthand coordinates of the display image on the canvas. Options can be any of the following

analysistype analysistype
Permissible values are: either FFT (default) or LPC
channel channel
Selects the channels.For 2 channel system: -1, means both; 0 is left, 1 is right. Default is -1.
stop endPos
Specifies the ending position of the sample to be display
fftlength fftlength
Specifies the number of FFT points must be one of the values 8, 16, 32, 64, 128, 256, 512, 1024, 2048, or 4096. Default is 512.
fill fillColor
Specifies the fill color
frame boolean
Is a boolean value, where true means a frame is drawn about section, false means no frame and is the default.
height height
Specifies the height of section
lpcorder order
Specifies the lpc order when the analysis type is LPC. The default value is 20
maxvalue max_dB
Specifies the max dB displayed. Default is 0.0
minvalue min_dB
Specifies the min dB displayed. Default is -80.0
preemphasisfactor factor
Specifies the amount of preemphasis applied to the signal prior to the FFT calculation, the default 0.0.
skip no_of_skip_points
specifies how many points to move the window forward at each step
start startPos
Specifies the starting position of the sample to be display
stipple bitmap
Specifies the bitmap for stipple
tags tagList
Specifies the tagList for canvas
topfrequency topFreq
Specifies the frequency at the right end of the section
width width
Specifies the width of section
windowtype type
Specifies the type of windowing function: must be one of the choices: Hamming, Hanning, Bartlett, Blackman, or Rectangle. The default is Hamming.
winlength size
Specifies the size of the (hamming) window, it is required to be no greater than fftlength Note the default is 128

See also TkCanvas#attachSection

set_sample(index, left=nil, right=nil) src

Sets sample value at index for left and/or right channel

spectrogram_to_canvas( canvas, x, y, options=nil, &op_block) src

Displays the spectogram of this sound onto the canvas The first argument, canvas, specifes the canvas to which the spectogram of this sound will be attached. The integers x and y are the upper lefthand coordinates of the display image on the canvas. Options can be any of the following

brightness brightness
Determines the brightness, must be chosen from the range of [-100, 100]. The default is 0.
channel channel
Selects the channels.For 2 channel system: -1, means both; 0 is left, 1 is right. Default is -1.
colormap colormap
Specifies a list of colors to determine the intesity. If the list is non-empty, it must have at least 2 colors, which will be interpreted as lowest to highest. The default is an empty list means produces a 32-level grey-scale display.
contrast contrast
Determines the contrast, must be chosen from the range of [-100, 100]. The default is 0.
stop endPos
Specifies the ending position of the sample to be display
fftlength no_fft_pts
Specifies the number of FFT points must be one of the values 8, 16, 32, 64, 128, 256, 512, 1024, 2048, or 4096. Default is 512.
gridcolor gridcolor
Specifies the color of the grid
gridfspacing gridfreqspacing
Specifies the spacing along the frequency axis value (in Hz) The default is 0, ie. no grid
gridtspacing gridtimespacing
Specifies the spacing along the time axis (in milliseconds) The default is 0, ie. no grid
height height
Specifies the height of the spectrogram
pixelspersecond pps
Specifies the number of horizontal pixels representing one second of elapsed time.
preemphasisfactor factor
Specifies the amount of preemphasis applied to the signal prior to the FFT calculation, the default 0.97.
start startPos
Specifies the starting position of the sample to be display
tags tagList
Specifies atagList for the canvas
topfrequency maxFreq
Specifies the maximum frequency of spectrogram. Default is the Nyquist
width width
Specifies the width of the spectrogram. Maximum of 32767 pixels.
windowtype type
Specifies thetype of windowing function: must be one of the choices: Hamming, Hanning, Bartlett, Blackman, or Rectangle. The default is Hamming.
winlength size
Specifies size of the (hamming) window, it is required to be no greater than fftlength Note the default is 128

See also TkCanvas#attachSpectrogram

stop() src

Stops current play or record operation.

waveform_to_canvas( canvas, x, y, options=nil, &op_block) src

Displays the waveform of this sound onto the canvas. The first argument, canvas, specifes the canvas to which the spectogram of this sound will be attached. The integers x and y are the upper lefthand coordinates of the display image on the canvas. Options can be any of the following

anchor anchorPos
Specifies the anchor positon on the canvas
channel channel
Selects the channels.For 2 channel system: -1, means both; 0 is left, 1 is right. Default is -1.
stop endPos
Specifies the ending position of the sample to be display
fill fillColor
Specifies the fill color
frame boolean
Determines whether the display is to be framed: true means a frame is drawn about the section, false means no frame and is the default.
height height
Specifies the height of the waveform
limit maxAmplitude
Specifies the maximum amplitude to be displayed
pixelspersecond pps
Is the number of horizontal pixels representing one second of elapsed time.
progress cmd
Specifies a callback procedure: NOT IMPLEMENTED YET!
shapefile fileName
Specifies the filename of a file for storing/retrieving precomputed waveform shape information. Used for speeding up display.
start startPos
Specifies the starting position of the sample to be display
stipple bitmap
Specifies bitmap for stipple.
subsample timeStepsPerPoint
Specifies the number of time steps between the points used for the generation of the waveform envelope. The default is 1, which gives the most faithful rendering, but is the slowest. Higher numbers will give quicker results, but risk degradation of the resulting waveform.
tags tagList
Specifies tag list
width width
Specifies the width of the waveform
zerolevel boolean
Determines whether to draw horizontal axis. The default value is true, which produces a horizontal line indicating 0 amplitude.

See also TkCanvas#attachWaveform

write( filename, options=nil, &op_block) src

Writes sound data to a file. Valid options are

start startPos
Specifies the starting position of the data sample to be written.
stop endPos
Specifies the stoping position of the data sample to be written.
fileformat RAW
Specifies the format of the file to be written. Legitimate formats currently supported are WAV, MP3, AU, SND, AIFF, SD, SMP, CSL, and RAW binary.
byteorder endianess
Valid values are littleEndian and bigEndian
progress boolean
Progress callback. Acceptable values are true or false (Also _"show"_ or _"hide"_)
[]( index ) src

Retrieves sample at a given index (for mono only)

Example
v=sound[0]
[]=(index, value) src

Sets a sample value at a given index (for mono only)

Example
sound[0]=v
+(trailingsound) src

Concatenates the sample data from trailingsound to the end of self. Both sounds must be of the same format (type). Self is unchanged. Returns the join of both data samples.