This chapter describes the
emuDB format, which is the new database format of the EMU-SDMS, and shows how to create and interact with this format. The
emuDB format is meant as a simple, general purpose way of storing speech databases that may contain complex, rich, hierarchical annotations as well as derived and complementary speech data. These different components will be described throughout this chapter, and examples will show how to generate and manipulate them. On designing the new EMU system, considerable effort went into designing an appropriate database format. We needed a format that was standardized, well structured, easy to maintain, easy to produce, easy to manipulate and portable.
emuDB (e.g. level names and labels)10.
We chose to use the widely adopted Waveform Audio File Format (
WAVE, or more commonly known as
WAV due to its filename extension) as our primary media/audio format. Although some components of the EMU-SDMS, notably the
wrassp package, can handle various other media/audio formats (see
?wrassp::AsspFileFormats for details) this is the only audio file format currently supported by every component of the EMU-SDMS. Nevertheless, the
wrassp package can be utilized to convert files from one of it’s other supported file formats to the
WAV format.11 Future releases of the EMU-SDMS might include the support of other media/audio formats.
In contrast to other systems, including the legacy EMU system, we chose to fully standardize the on-disk structure of speech databases with which the system is capable of working. This provides a standardized and structured way of storing speech databases while providing the necessary amount of freedom and separability to accommodate multiple types of data. Further, this standardization enables fast parsing and simplification of file-based error tracking and simplifies database subset and merging operations as well as database portability. An overview of all database interaction functions is given in Section 10.2.
5.1 Database design
emuDB consists of a set of files and directories that adhere to a certain structure and naming convention (see Figure 5.1). The database root directory must include a single
_DBconfig.json file that contains the configuration options of the database such as its level definitions, how these levels are linked in the database hierarchy and how the data is to be displayed by the graphical user interface. A detailed description of the
_DBconfig.json file is given in Appendix 15.1.1. The database root directory also contains arbitrarily named session directories (except for the obligatory
_ses suffix). These session directories can be used to group the recordings of a database in a logical manner. Sessions can be used, for example, to group all recordings from speaker
AAA into a session called
Each session directory can contain any number of
_bndl directories (e.g.,
rec9_bndl). All files belonging to a recording (i.e., all files describing the same timeline) are stored in the same bundle directory. This includes the actual recording (
.wav) and can contain optional derived or supplementary signal files in the simple signal file format (SSFF) (Cassidy 2013) such as formants (
.fms) or the fundamental frequency (
.f0), both of which can be calculated using the
wrassp package (see Chapter 8). Each bundle directory contains the annotation file (
_annot.json) of that bundle (i.e., the annotations and the hierarchical linking information; see Appendix 15.1.2 for a detailed description of the file format). JSON schema files for all the JSON files types used have been developed to ensure the syntactic integrity of the database (see https://github.com/IPS-LMU/EMU-webApp/tree/master/dist/schemaFiles). All files within a bundle that are associated with that bundle must have the same basename as the
_bndl directory prefix. For example, the signal file in bundle
rec1_bndl must have the name
rec1.wav to be recognized as belonging to the bundle. The optional
_emuDBcache.sqlite file in the root directory (see Figure 5.1 contains the relational cache representation of the annotations of the
emuDB (see Chapter 11 for further details). All files in an
_bndl directory that do not follow the above naming conventions will simply be ignored by the database interaction functions of the
5.2 Creating an
The two main strategies for creating
emuDBs are either to convert existing databases or file collections to the new format or to create new databases from scratch where only
.wav audio files are present. Chapter 3 gave an example of how to create an
emuDB from an existing TextGrid file collection and other conversion routines are covered in Section 10.1. In this chapter we will focus on creating an
emuDB from scratch with nothing more than a set of
.wav audio files present.
5.2.1 Creating an
emuDB from scratch
The R code snippet below shows how an empty
emuDB is created in the directory provided by R’s
tempdir() function. As can be seen by the output of the
create_emuDB() creates a directory containing a
_DBconfig.json file only.
# load package library(emuR, warn.conflicts = F) # create demo data in directory # provided by tempdir() create_emuRdemoData(dir = tempdir()) # create emuDB called "fromScratch" create_emuDB(name = "fromScratch", targetDir = tempdir(), verbose = F) # generate path to the empty fromScratch created above dbPath = file.path(tempdir(), "fromScratch_emuDB") # show content of empty fromScratch emuDB list.files(dbPath)
##  "fromScratch_DBconfig.json"
5.2.2 Loading and editing an empty database
The initial step in manipulating and generally interacting with a database is to load the database into the current R session. The R code below shows how to load the fromScratch database and shows the empty configuration by displaying the output of the
## Name: fromScratch ## UUID: d4ddcc88-6985-4a23-aedb-cc7d349dd0f1 ## Directory: /private/var/folders/yk/8z9tn7kx6hbcg_9n4c1sld980000gn/T/Rtmp0WaDps/fromScratch_emuDB ## Session count: 0 ## Bundle count: 0 ## Annotation item count: 0 ## Label count: 0 ## Link count: 0 ## ## Database configuration: ## ## SSFF track definitions: ## NULL ## ## Level definitions: ## NULL ## ## Link definitions: ## NULL
##  "emuDBhandle"
As can be seen in the above R code example, the class of a loaded
emuDBhandle object is used to reference a loaded
emuDB in the database interaction functions of the
emuR package. In this chapter we will show how to use this
emuDBhandle object to perform database manipulation operations. Most of the
emuDB manipulation functions follow the following function prefix naming convention:
add_XXXadd a new instance of
set_XXXset the current instance of
list_XXXlist the current instances of
get_XXXget the current instance of
remove_XXXremove existing instances of
5.2.3 Level definitions
Unlike other systems, the EMU-SDMS requires the user to formally define the annotation structure for the entire database. An essential structural element of any
emuDB are its levels. A level is a more general term for what is often referred to as a tier. It is more general in the sense that people usually expect tiers to contain time information. Levels can either contain time information if they are of the type
EVENT or of the type
SEGMENT but are timeless if they are of the type
ITEM (see Chapter 4 for further details). It is also worth noting that an
emuDB distinguishes between the definition of an annotation structure element and the actual annotations. The definition of an annotation structure element such as a level definition is merely an entry in the
_DBconfig.json file which specifies that this level is allowed to be present in the
_annot.json files. The levels that are present in an
_annot.json file, on the other hand, have to adhere to the definitions in the
As the fromScratch database (already loaded) does not contain any annotation structural element definitions, the R code snippet below shows how a new level definition called Phonetic of type
SEGMENT is added to the
## name type nrOfAttrDefs attrDefNames ## 1 Phonetic SEGMENT 1 Phonetic;
The example below shows how a further level definition is added that will contain the orthographic word transcriptions for the words uttered in our recordings. This level will be of the type
ITEM, meaning that elements contained within the level are sequentially ordered but do not contain any time information.
## name type nrOfAttrDefs attrDefNames ## 1 Phonetic SEGMENT 1 Phonetic; ## 2 Word ITEM 1 Word;
remove_levelDefinition() can also be used to remove unwanted level definitions. However, as we wish to further use the levels Phonetic and Word, we will not make use of this function here.
188.8.131.52 Attribute definitions
Each level definition can contain multiple attributes, the most common, and currently only supported attribute being a label (of type
STRING). Thus it is possible to have multiple parallel labels (i.e., attribute definitions) in a single level. This means that a single annotation item instance can contain multiple labels while sharing other properties such as the start and duration information. This can be useful when modeling certain types of data. An example of this would be the Phonetic level created above. It is often the case that databases contain both the phonetic transcript using IPA UTF-8 symbols as well as a transcript using Speech Assessment Methods Phonetic Alphabet (SAMPA) symbols. To avoid redundant time information, both of these annotations can be stored on the same
Phonetic level using multiple attribute definitions (i.e., parallel labels). The next R code snippet shows the current attribute definitions of the
## name level type hasLabelGroups hasLegalLabels ## 1 Phonetic Phonetic STRING FALSE FALSE
Even though no attribute definition has been added to the
Phonetic level, it already contains an attribute definition that has the same name as its level. This attribute definition represents the obligatory primary attribute of that level. As every level must contain an attribute definition that has the same name as its level, it is automatically added by the
add_levelDefinition() function. To follow the above example, the next R code snippet adds a further attribute definition to the
Phonetic level that contains the SAMPA versions of our annotations.
## name level type hasLabelGroups hasLegalLabels ## 1 Phonetic Phonetic STRING FALSE FALSE ## 2 SAMPA Phonetic STRING FALSE FALSE
184.108.40.206 Legal labels
As can be inferred from the columns
hasLegalLabels of the output of the above
list_attributeDefinitions() function, attribute definitions can also contain two further optional fields. The
legalLabels field contains an array of strings that specifies the labels that are legal (i.e., allowed or valid) for the given attribute definition. As the
EMU-webApp does not allow the annotator to enter any labels that are not specified in this array, this is a simple way of assuring that a level has a consistent label set. The following R code snippet shows how the
get_legalLabels functions can be used to specify a legal label set for the primary
Word attribute definition of the
##  NA
# set legal labels values # for "Word" attribute definition set_legalLabels(dbHandle, levelName = "Word", attributeDefinitionName = "Word", legalLabels = wordLabels) # show recently added legal labels # for "Word" attribute definition get_legalLabels(dbHandle, levelName = "Word", attributeDefinitionName = "Word")
##  "amongst" "any" "are" "always" "and" "attracts"
220.127.116.11 Label groups
A further optional field is the
labelGroups field. It contains specifications of groups of labels that can be referenced by a name given to the group while querying the
emuDB. The R code below shows how the
add_attrDefLabelGroup() function is used to add two label groups to the
Phonetic attribute definition. One of the groups is used to reference a subset of longVowels and the other to reference a subset of shortVowels on the
# add long vowels label group add_attrDefLabelGroup(dbHandle, levelName = "Phonetic", attributeDefinitionName = "Phonetic", labelGroupName = "longVowels", labelGroupValues = c("i:", "u:")) # add short vowels label group add_attrDefLabelGroup(dbHandle, levelName = "Phonetic", attributeDefinitionName = "Phonetic", labelGroupName = "shortVowels", labelGroupValues = c("i", "u", "@")) # list current label groups list_attrDefLabelGroups(dbHandle, levelName = "Phonetic", attributeDefinitionName = "Phonetic")
## name values ## 1 longVowels i:; u: ## 2 shortVowels i; u; @
## segment list from database: fromScratch ## query was: Phonetic == shortVowels ##  labels start end session bundle level type ## <0 rows> (or 0-length row.names)
For users who are familiar with or transitioning from the legacy EMU system, it is worth noting that the label groups correspond to the unfavorably named
Legal Labels entries of the GTemplate Editor (i.e., legal entries in the
.tpl file) of the legacy system. In the new system the
legalLabels entries specify the legal or allowed label values of attribute definitions while the
labelGroups specify groups of labels that can be referenced by the names given to the groups while performing queries.
A new feature of the EMU-SDMS is the possibility of defining label groups for the entire
emuDB as opposed to a single attribute definition (see
?add_labelGroups for further details). This avoids the redundant definition of label groups that should span multiple attribute definitions (e.g., a longVowels subset that is to be queried on a level called Phonetic_1 as well as a level called Phonetic_2).
5.2.5 File handling
The previous sections of this chapter defined the simple structure of the fromScratch
emuDB. An essential element that is still missing from the
emuDB is the actual audio speech data12. The following R code example shows how the
import_mediaFiles() function can be used to import audio files, referred to as media files in the context of an
emuDB, into the fromScratch
# get the path to directory containing .wav files wavDir = file.path(tempdir(), "emuR_demoData", "txt_collection") # Import media files into emuDB session called fromWavFiles. # Note that the txt_collection directory also contains .txt files. # These are simply ignored by the import_mediaFiles() function. import_mediaFiles(dbHandle, dir = wavDir, targetSessionName = "fromWavFiles", verbose = F) # list session list_sessions(dbHandle)
## name ## 1 fromWavFiles
## session name ## 1 fromWavFiles msajc003 ## 2 fromWavFiles msajc010 ## 3 fromWavFiles msajc012 ## 4 fromWavFiles msajc015 ## 5 fromWavFiles msajc022 ## 6 fromWavFiles msajc023 ## 7 fromWavFiles msajc057
## # A tibble: 2 x 4 ## session bundle file absolute_file_path ## * <chr> <chr> <chr> <chr> ## 1 fromWavFi… msajc0… msajc003_a… /private/var/folders/yk/8z9tn7kx6hbcg_9n… ## 2 fromWavFi… msajc0… msajc003.w… /private/var/folders/yk/8z9tn7kx6hbcg_9n…
import_mediaFiles() call above added a new session called
fromWavFiles to the fromScratch
emuDB containing a new bundle for each of the imported media files. The annotations of every bundle, despite containing empty levels, adhere to the structure specified above. This means that every
_annot.json file created contains an empty
Phonetic level array and the links array is also empty.
emuR package also provides a mechanism for adding files to preexisting bundle directories, as this can be quite tedious to perform manually due to the nested directory structure of an
emuDB. The following R code shows how preexisting
.zcr files that are produced by
zcrana() function can be added to the preexisting session and bundle structure. As the directory referenced by
wavDir does not contain any
.zcr files, the next R code example first creates them and then adds them to the
emuDB (see Chapter 8 for further details).
##  7
## # A tibble: 3 x 4 ## session bundle file absolute_file_path ## * <chr> <chr> <chr> <chr> ## 1 fromWavFi… msajc0… msajc003_a… /private/var/folders/yk/8z9tn7kx6hbcg_9n… ## 2 fromWavFi… msajc0… msajc003.w… /private/var/folders/yk/8z9tn7kx6hbcg_9n… ## 3 fromWavFi… msajc0… msajc003.z… /private/var/folders/yk/8z9tn7kx6hbcg_9n…
5.2.6 SSFF track definitions
A further important structural element of any
emuDB is use of the so-called SSFF tracks, which are often simply referred to as tracks. These SSFF tracks reference data that is stored in the SSFF (see Appendix 15.1.3 for a detailed description of the file format) within the
_bndl directories. The two main types of data are:
- complementary data that was acquired during the recording such as by EMA or EPG; or
- derived data, that is data that was calculated from the original audio signal such as formant values and their bandwidths or the short-term Root Mean Square amplitude of the signal.
As Section 8.7 covers how the SSFF file output of a
wrassp function can be added to an
emuDB, an explanation will be omitted here. The following R code snippet shows how the
.zcr files added in the R example above can be added as an SSFF track definition (see Chapter 8 for further details).
## name columnName fileExtension ## 1 zeroCrossing zcr zcr
5.2.7 Configuring the
EMU-webApp and annotating the
As previously mentioned, the current fromScratch
emuDB contains only empty levels. In order to start annotating the database, the
EMU-webApp has to be configured to display the desired information. Although the configuration of the
EMU-webApp is stored in the
_DBconfig.json file and is therefore a part of the
emuDB format, here we will omit an explanation of the extensive possibilities of configuring the web application (see Chapter 9 for an in-depth explanation). The R code snippet below shows how the
Phonetic level is added to the level canvases order array of the default perspective.
##  "Phonetic"
As a final step before beginning the annotation process, the fromScratch
emuDB has to be served to the
EMU-webApp for annotation and visualization purposes. The code below shows how this can be achieved using the
This chapter introduced the elements that comprise the new
emuDB format and provided a practical overview of the essential database interaction functions provided by the
emuR package. We feel the
emuDB format provides a general purpose, flexible approach to storing speech databases with the added benefit of being able to directly manipulate and analyse these databases using the tools provided by the EMU-SDMS.
Kisler, Thomas, Florian Schiel, and Han Sloetjes. 2012. “Signal Processing via Web Services: The Use Case WebMAUS.” In Proceedings Digital Humanities 2012, Hamburg, Germany, 30–34. Hamburg.
Draxler, Chr., and K. Jänsch. 2004. “SpeechRecorder - a Universal Platform Independent Multi-Channel Audio Recording Software.” In Proc. Of the IV. International Conference on Language Resources and Evaluation, 559–62. Lisbon, Portugal.
Cassidy, Steve. 2013. “The Emu Speech Database System Manual: Chapter 9. Simple Signal File Format.” http://emu.sourceforge.net/manual/chap.ssff.html.
JSON schema files available here https://github.com/IPS-LMU/EMU-webApp/tree/master/dist/schemaFiles↩
According to the JSON specification (see https://json.org/) the only characters that have to be escaped within a JSON string are: ’’ (as this marks the start/end of a string), \ (as this is the escape character) or control-characters (\b = backspace, \f = form feed, \n = new line, \r = carriage return, \t = tab). Unicode characters in their hexadecimal form using the \u followed by for-hex-digits may also be used.↩
However, if things like resampeling are required we suggest using other tools such as the freely available Sound eXchange (SoX) command line tool (see http://sox.sourceforge.net/) to perform these operation↩
EMU-webAppcurrently only supports mono 16 Bit
.wavaudio files, we currently recommend using this format only.↩