8 The R package
This chapter gives an overview and introduction to the
wrassp package. The
wrassp package is a
R around Michel Scheffers’
libassp library and therefore the
wrassp package provide functionality for handling speech signal files in most common audio formats and for performing signal analyses common in the phonetic and speech sciences. As such,
wrassp fills a gap in the R package landscape as, to our knowledge, no previous packages provided this specialized functionality. The currently available signal processing functions provided by
acfana(): Analysis of short-term autocorrelation function
afdiff(): Computes the first difference of the signal
affilter(): Filters the audio signal (e.g., low-pass and high-pass)
cepstrum(): Short-term cepstral analysis
cssSpectrum(): Cepstral smoothed version of
dftSpectrum(): Short-term DFT spectral analysis
forest(): Formant estimation
ksvF0(): F0 analysis of the signal
lpsSpectrum(): Linear predictive smoothed version of
mhsF0(): Pitch analysis of the speech signal using Michel Scheffers’
rfcana(): Linear prediction analysis
rmsana(): Analysis of short-term Root Mean Square amplitude
zcrana(): Analysis of the averages of the short-term positive and negative zero-crossing rates
The available file handling functions are:
read.AsspDataObj(): read a SSFF or audio file into an
AsspDataObj, which is the in-memory equivalent of the SSFF or audio file.
write.AsspDataObj(): write an
AsspDataObjto file (usually SSFF or audio file formats).
help() function for a comprehensive list of every function and object provided by the
wrassp package is required (see R code snippet below).
wrassp package can be used independently of the EMU-SDMS this chapter largely focuses on using it as an independent component. However, Section 8.7 provides an overview of how the package is integrated into the EMU-SDMS. Further, although the
wrassp package has its own set of example audio files (which can be accessed in the directory provided by
system.file('extdata', package='wrassp')), this chapter will use the audio and SSFF files that are part of the
emuDB of the demo data provided by the
emuR package. This is done primarily to provide an overview of what it is like using
wrassp to work on files in an
emuDB. The R code snippet below shows how to generate this demo data followed by a listing of the files contained in a directory of a single bundle called
msajc003 (see Chapter @ref(chap:emuDB} for information about the
emuDB format). The output of the call to
list.files() shows four files where the
.fms files are in the SSFF file format (see Appendix 15.1.3 for further details). The
_annot.json file contains the annotation information, and the
.wav file is one of the audio files that will be used in various signal processing examples in this chapter.
# load the emuR package library(emuR) # create demo data in directory # provided by tempdir() create_emuRdemoData(dir = tempdir()) # create path to demo database path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB") # create path to bundle in database path2bndl = file.path(path2ae, "0000_ses", "msajc003_bndl") # list files in bundle directory list.files(path2bndl)
##  "msajc003_annot.json" "msajc003.dft" "msajc003.fms" ##  "msajc003.wav"
8.2 File I/0 and the
One of the aims of
wrassp is to provide mechanisms for handling speech-related files such as audio files and derived and complementary signal files. To have an in-memory object that can hold these file types in a uniform way the
wrassp package provides the
AsspDataObj data type. The R code snippet below shows how the
read.AsspDataObj() can be used to import a
.wav audio file.
##  "AsspDataObj"
## Assp Data Object of file /var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T//RtmpQdmrjS/emuR_demoData/ae_emuDB/0000_ses/msajc003_bndl/msajc003.wav. ## Format: WAVE (binary) ## 58089 records at 20000 Hz ## Duration: 2.904450 s ## Number of tracks: 1 ## audio (1 fields)
As can be seen in the above R code snippet, the resulting
au object is of the class
AsspDataObj. The output of
##  2.90445
##  20000
##  58089
## $names ##  "audio" ## ## $trackFormats ##  "INT16" ## ## $sampleRate ##  20000 ## ## $filePath ##  "/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000g..." ## ## $origFreq ##  0 ## ## $startTime ##  0 ## ## $startRecord ##  1 ## ## $endRecord ##  58089 ## ## $class ##  "AsspDataObj" ## ## $fileInfo ##  21 2
The sample values belonging to a trackdata objects tracks are also stored within an
AsspDataObj object. As mentioned above, the currently loaded object contains a single mono audio track. Accessing the data belonging to this track, in the form of a matrix, can be achieved using the track’s name in combination with the
$ notation known from R’s common named
list object. Each matrix has the same number of rows as the track has records and as many columns as the track has fields. The R code snippet below shows how the
audio track can be accessed.
##  "audio"
##  "audio"
##  58089 1
## [,1] ## [1,] 64
This data can, for example, be used to generate an oscillogram of the audio file as shown in the R code snippet below, which produces Figure 8.1.
# calculate sample time of every 10th sample samplesIdx = seq(0, numRecs.AsspDataObj(au) - 1, 10) samplesTime = samplesIdx / rate.AsspDataObj(au) # extract every 10th sample using window() function samples = window(au$audio, deltat=10) # plot samples stored in audio attribute # (only plot every 10th sample to accelerate plotting) plot(samplesTime, samples, type = "l", xlab = "time (s)", ylab = "Audio samples (INT16)")
The export counterpart to
read.AsspDataObj() function is
write.AsspDataObj(). It is used to store in-memory
AsspDataObj objects to disk and is particularly useful for converting other formats to or storing data in the SSFF file format as described in Section ??. To show how this function can be used to write a slightly altered version of the
au object to a file, the R code snippet below initially multiplies all the sample values of
au$audio by a factor of
0.5. The resulting
AsspDataObj is then written to an audio file in a temporary directory provided by
8.3 Signal processing
As mentioned in the introduction to this chapter, the
wrassp package is capable of more than just the mere importing and exporting of specific signal file formats. This section will focus on demonstrating three of
wrassp’s signal processing functions that calculate formant values, their corresponding bandwidths, the fundamental frequency contour and the RMS energy contour. Section 8.5 and ?? demonstrates signal processing to the audio file saved under
path2wav, while Section 8.5.2 adresses processing all the audio files belonging to the ae
wrassp package comes with the
wrasspOutputInfos object, which provides information about the various signal processing functions provided by the package. The
wrasspOutputInfos object stores meta information associated with the different signal processing functions
wrassp provides. The R code snippet below shows the names of the
wrasspOutputInfos object which correspond to the function names listed in the introduction of this chapter.
##  "acfana" "afdiff" "affilter" "cepstrum" "cssSpectrum" ##  "dftSpectrum" "ksvF0" "mhsF0" "forest" "lpsSpectrum" ##  "rfcana" "rmsana" "zcrana"
This object can be useful to get additional information about a specific
wrassp function. It contains information about the default file extension (
$ext), the tracks produced (
$tracks) and the output file type (
$outputType). The R code snippet below shows this information for the
## $ext ##  "fms" ## ## $tracks ##  "fm" "bw" ## ## $outputType ##  "SSFF"
The examples that follow will make use of this
wrasspOutputInfos object mainly to acquire the default file extensions given by a specific
wrassp signal processing function.
8.5 Formants and their bandwidths
The already mentioned
wrassp’s formant estimation function. The default behavior of this formant tracker is to calculate the first four formants and their bandwidths. The R code snippet below shows the usage of this function. As the default behavior of every signal processing function provided by
wrassp is to store its result to a file, the
toFile parameter of
forest() is set to
FALSE to prevent this behavior. This results in the same
AsspDataObj object as when exporting the result to file and then importing the file into R using
read.AsspDataObj(), but circumvents the disk reading/writing overhead.
##  "AsspDataObj"
##  "fm" "bw"
##  581 4
##  TRUE
As can be seen in the above R code snippet, the object resulting from the
forest() function is an object of class
AsspDataObj with the tracks
"fm" (formants) and
"bw" (formant bandwidths), where both track matrices have four columns (corresponding to F1, F2, F3 and F4 in the
"fm" track and F1bandwidth, F2bandwidth, F3bandwidth and F4bandwidth in the
"bw" track) and 581 rows. To visualize the calculated formant values, the R code snippet below shows how R’s
matplot() function can be used to produce Figure 8.2.
# plot the formant values matplot(seq(0, numRecs.AsspDataObj(fmBwVals) - 1) / rate.AsspDataObj(fmBwVals) + attr(fmBwVals, "startTime"), fmBwVals$fm, type = "l", xlab = "time (s)", ylab = "Formant frequency (Hz)") # add legend startFormant = 1 endFormant = 4 legend("topright", legend = paste0("F", startFormant:endFormant), col = startFormant:endFormant, lty = startFormant:endFormant, bg = "white")
8.5.1 Fundamental frequency contour
wrassp package includes two fundamental frequency estimation functions called
mhsF0(). The R code snippet below shows the usage of the
ksvF0() function, this time not utilizing the
toFile parameter but rather to show an alternative procedure, reading the resulting SSFF file produced by it. It is worth noting that every signal processing function provided by
wrassp creates a result file in the same directory as the audio file it was processing (except if the
outputDirectory parameter is set otherwise). The default extension given by the
ksvF0() is stored in
wrasspOutputInfos\$ksvF0\$ext, which is used in the R code snippet below to create the newly generated file’s path.
By analogy with to the formant estimation example, the R code snippet below shows how the
plot() function can be used to visualize this data as in Figure 8.3.
8.5.2 RMS energy contour
wrassp function for calculating the short-term root mean square (RMS) amplitude of the signal is called
rmsana(). As its usage is analogous to the above examples, here we will focus on using it to calculate the RMS values for all the audio files of the
emuDB. The R code snippet below initially uses the
list.files() function to aquire the file paths for every
.wav file in the
emuDB. As every signal processing function accepts one or multiple file paths, these file paths can simply be passed in as the main argument to the
rmsana() function. As all of
wrassp’s signal processing functions place their generated files in the same directory as the audio file they process, the
rmsana() function will automatically place every
.rms into the correct bundle directory.
# list all .wav files in the ae emuDB paths2wavFiles = list.files(path2ae, pattern = ".*wav$", recursive = TRUE, full.names = TRUE) # calculate the RMS energy values for all .wav files rmsana(paths2wavFiles) # list new .rms files using # wrasspOutputInfos->rmsana->ext rmsFPs = list.files(path2ae, pattern = paste0(".*", wrasspOutputInfos$rmsana$ext), recursive = TRUE, full.names = TRUE) # read first RMS file rmsvals = read.AsspDataObj(rmsFPs)
The R code snippet below shows how the
plot() function can be used to visualize this data as in Figure 8.4.
wrassp’s function calls
As it can be extremely important to keep track of information about how certain files are created and calculated, every signal processing function provided by the
wrassp package comes with the ability to log its function calls to a specified log file. The R code snippet below shows a call to the
ksvF0() function where a single parameter was changed from its default value (
windowShift = 10). The content of the created log files (shown by the call to
readLines()) contains the function name, time stamp, parameters that were altered and processed file path information. It is worth noting that a log file can be reused for multiple function calls as the log function does not overwrite an existing file but merely appends new log information to it.
##  1
##  "" ##  "##################################" ##  "##################################" ##  "######## ksvF0 performed ########" ##  "Timestamp: 2018-10-19 16:39:54 " ##  "windowShift : 10 " ##  "forceToLog : T " ##  " => on files:"
wrassp in the EMU-SDMS
As shown in Section 8.5.2, the
wrassp signal processing functions can be used to calculate SSFF files and place them into the appropriate bundle directories. The only thing that has to be done to make an
emuDB aware of these files is to add an SSFF track definition to the
emuDB as shown in the R code snippet below. Once added, this SSFF track can be referenced via the
ssffTrackName parameter of the
get_trackdata() function as shown in various examples throughout this documentation. It is worth noting that this strategy is not necessarily relevant for applying the same signal processing to an entire
emuDB, as this can be achieved using the on-the-fly
add_ssffTrackDefinition() method described in the according R code snippet below. However, it becomes necessary if certain bundles are to be processed using deviating function parameters. This can, for example, be relevant when setting the minimum and maximum frequencies that are to be considered while estimating the fundamental frequencies (e.g., the
ksvfF0()) for female versus male speakers.
# load emuDB ae = load_emuDB(path2ae) # add SSFF track defintion # that references the .rms files # calculated above # (i.e. no new files are calculated and added to the emuDB) ext = wrasspOutputInfos$rmsana$ext colName = wrasspOutputInfos$rmsana$tracks add_ssffTrackDefinition(ae, name = "rms", fileExtension = ext, columnName = colName)
A further way to utilize
wrassp’s signal processing functions as part of the EMU-SDMS is via the
onTheFlyParams parameters of the
get_trackdata() functions. Using the
onTheFlyFunctionName parameter in the
add_ssffTrackDefinition() function automatically calculates the SSFF files while also adding the SSFF track definition. Using this parameter with the
get_trackdata() function calls the given
wrassp function with the
toFile parameter set to
FALSE and extracts the matching segments and places them in the resulting
emuRtrackdata object. In many cases, this avoids the necessity of having SSFF track definitions in the
emuDB. In both functions, the optional
onTheFlyParams parameter can be used to specify the parameters that are passed into the signal processing function. The R code snippet below shows how R’s
formals() function can be used to get all the parameters of
wrassp’s short-term positive and negative zero-crossing rate (ZCR) analysis function
zrcana(). It then changes the default window size parameter to a new value and passes the parameters object into the
##  "listOfFiles" "optLogFilePath" "beginTime" ##  "centerTime" "endTime" "windowShift" ##  "windowSize" "toFile" "explicitExt" ##  "outputDirectory" "forceToLog" "verbose"
# change window size from the default # value of 25 ms to 50 ms zcranaParams$windowSize = 50 # to have a segment list to work with # query all Phonetic 'n' segments sl = query(ae, "Phonetic == n") # get trackdata calculating ZCR values on-the-fly # using the above parameters. Note that no files # are generated. td = get_trackdata(ae, sl, onTheFlyFunctionName = "zcrana", onTheFlyParams = zcranaParams, verbose = FALSE) # add SSFF track definition. Note that # this time files are generated. add_ssffTrackDefinition(ae, name = "zcr", onTheFlyFunctionName = "zcrana", onTheFlyParams = zcranaParams, verbose = FALSE)
8.8 Storing data in the SSFF file format
One of the benefits gained by having the
AsspDataObj in-memory object is that these objects can be constructed from scratch in R, as they are basically simple
list objects. This means, for example, that any set of n-dimensional samples over time can be placed in a
AsspDataObj and then stored as an SSFF file using the
write.AsspDataObj() function. To show how this can be done, the R code snippet below creates an arbitrary data sample in the form of a single cycle sine wave between \(0\) and \(2*pi\) that is made up of 16000 samples and displays it in Figure 8.5.
Assuming a sample rate of 16 kHz
sineWave would result in a sine wave with a frequency of 1 Hz and a duration of one second. The R code snippet below shows how a
AsspDataObj can be created from scratch and the data in
sineWave placed into one of its tracks. It then goes on to write the
AsspDataObj object to an SSFF file.
# create empty list object ado = list() # add sample rate attribute attr(ado, "sampleRate") = 16000 # add start time attribute attr(ado, "startTime") = 0 # add start record attribute attr(ado, "startRecord") = as.integer(1) # add end record attribute attr(ado, "endRecord") = as.integer(length(sineWave)) # set class of ado class(ado) = "AsspDataObj" # show available file formats AsspFileFormats
## RAW ASP_A ASP_B XASSP IPDS_M IPDS_S AIFF AIFC CSL ## 1 2 3 4 5 6 7 8 9 ## CSRE ESPS ILS KTH SWELL SNACK SFS SND AU ## 10 11 12 13 13 13 14 15 15 ## NIST SPHERE PRAAT_S PRAAT_L PRAAT_B SSFF WAVE WAVE_X XLABEL ## 16 16 17 18 19 20 21 22 24 ## YORK UWM ## 25 26
# set file format to SSFF # NOTE: assignment of "SSFF" also possible AsspFileFormat(ado) = as.integer(20) # set data format (1 == 'ascii' and 2 == 'binary') AsspDataFormat(ado) = as.integer(2) # set track format specifiers # (available track formats for numbers # that match their C equivalent are: # "UINT8"; "INT8"; "UINT16"; "INT16"; # "UINT24"; "INT24"; "UINT32"; "INT32"; # "UINT64"; "INT64"; "REAL32"; "REAL64"); attr(ado, "trackFormats") = c("REAL32") # add track ado = addTrack(ado, "sine", sineWave, "REAL32") # write AsspDataObj object to file write.AsspDataObj(dobj = ado, file = file.path(tempdir(), "example.sine"))
Although somewhat of a generic example, the above R code snippet shows how to generate an
AsspDataObj from scratch. This approach can, for example, be used to read in signal data produced by other software or signal data acquisition devices. Hence, this approach can be used to import many forms of data into the EMU-SDMS. Appendix @ref(sec:app-chap-wrassp-praatsSigProc} shows an example of how this approach can be used to take advantage of Praat’s signal processing capabilities and integrate its output into the EMU-SDMS.
wrassp packages enriches the R package landscape by providing functionality for handling speech signal files in most common audio formats and for performing signal analyses common in the phonetic and speech sciences. The EMU-SDMS utilizes the functionality that the
wrassp package provides by allowing the user to calculate signals that match the segments of a segment list. This can either be done in real time or by extracting the signals from files. Hence, the
wrassp package is an integral part of the EMU-SDMS but can also be used as a standalone package if so desired.
Some examples of this chapter are adapted version of examples given in the
wrassp_introvignette of the