|
  |
Guidelines for ToBI Labelling
(version 3, March 1997)
copyright (1993) The Ohio State University Research Foundation
by Mary E. Beckman & Gayle M. Ayers
0. Preface
0.1. What are the "Guidelines for ToBI Labelling"?
ToBI (for Tones and Break Indices) is a system for transcribing the
intonation patterns and other aspects of the prosody of English
utterances. It was devised by a group of speech scientists from
various different disciplines (electrical engineering, psychology,
linguistics, etc.) who wanted a common standard for transcribing an
agreed-upon set of prosodic elements, in order to be able to share
prosodically transcribed databases across research sites in the
pursuit of diverse research purposes and varied technological goals.
Silverman et al. (1992) describe the motivation for and development of
the ToBI system. If you ask for this handbook in hard copy, that
paper will be appended as Appendix B. Appendix A (which is included
both in the hard copy and in the ASCII file version of this labelling
guide) is "The ToBI Annotation Conventions", the definitive summary
statement of the symbols and marks used in ToBI transcriptions, and of
the conventions that we have agreed upon for their use. The rest of
this labelling guide is a more detailed description of the system,
with reference to accompanying utterances of two types: example
utterances to illustrate points made in the text and exercise
utterances to give labellers practice on the points made in the text.
These utterances are set off in the text of the labelling guide using
the following typographic conventions.
EXAMPLE <<basename>>: orthographic
transcription
tonal transcription and/or break index values
EXERCISE <<basename>>: orthographic transcription
Each example utterance is also referred to in the text by its basename
within pairs of angle brackets -- e.g., the first example utterance is
<<jam1>>. We have chosen the examples and arranged the
exercises with the aim of leading new users through the system in a
self-taught training course, trying to choose utterances in each of
the six practice sets that show only phenomena that have been
introduced up to that point.
The utterances that accompany this labelling guide can be obtained in
two formats: as digitized computer files with electronic record of the
f0 contour from the Ohio State University web and ftp distribution
site (see section 0.2) or as an audio tape with paper record of the f0
contour (see section 0.3).
0.1.1 Notice of copyright and restrictions on use
The "Guidelines for ToBI Labelling" document and associated material
are copyrighted. The text cannot be copied or distributed in any
format unless this paragraph is included. The utterances accompanying
the guide are available to any interested user, but only for
non-commercial use. The National Science Foundation and the Ohio
State University make no warranty and accept no liability associated
with the use of these materials. These materials may be obtained only
as described in Sections 0.2 and 0.3, and are not to be redistributed
by other user sites. Users may not redistribute these materials from
their own sites, but should instead tell interested people how to
obtain their own copy from the distribution sites.
0.1.2 Acknowledgements
The "Guidelines for ToBI Labelling" and the accompanying utterances
were developed in the Ohio State University Linguistics Laboratory
with partial support from the National Science Foundation, and the
Ohio State University continues to support the labelling guide by
providing a distribution site for the electronic records (described in
Section 0.2). Colin Wightman generously provided the distribution
site for the electronic records of version 2.0 of the labelling guide
in his lab at the New Mexico Institute of Mining and Technology.
Jennifer Venditti provided LaTeXing and various other editing
expertise for this earlier version, which we have relied on in
producing this new one. Kim Silverman and John Pitrelli developed the
original transcriber script, on which we based the primary shell
scripts for viewing the examples and doing the exercises. David
Talkin helped in innumerable ways, such as by developing the scripts
for the cardinal examples. Harald Singer developed an alternate
electronic format for version 2.0, and Stephanie Jannedy set up the
web page for it and for the ftp site.
0.2. Getting and using the digitized utterances and f0 tracks
If you have waves(tm) (an Entropic Research Laboratory product) or a
similar computer display system, obtain the speech files, electronic
record of the f0 contour, and label files by ftp from The Ohio State
University distribution site.
0.2.1. Getting the digitized utterances and f0 tracks
There are two options for obtaining the ToBI materials depending upon
how much disk space users have available. For those with sufficient
disk space there is a single large tarfile for convenience. This
option requires about 40 MB available during the installation process;
the full materials occupy about 20 MB once the installation is
completed and the tarfile is removed. If you do not have enough space
to have both the complete tarfile and all the installed files at the
same time, use the second option. There are three smaller tarfiles
which together contain all the materials contained in the single large
tarfile. That is, they contain the speech files, f0 records, and
label files divided into three parts by order of occurrence in the
Guidelines. In addition to the single large or three smaller
tarfiles, you will need to get the "essentials" tarfile, which is
about 2.5 MB, and contains an ASCII version of "The Guidelines for
ToBI Labelling" and the scripts and tools for looking at the f0 tracks
and labels.
If you are reading this page over the WWW, click here to access the
tarfiles. Download the README-file first for descriptions of the tar
files and of the directory structure that they will set up on your
home system. Otherwise use ftp to get these tarfiles. On your home
system, enter the command:"ftp ling.ohio-state.edu (or use the
internet address for ling, which is 128.146.172.200) from the
directory where you would like to have the materials (the installation
process will create a directory called TOBI-TRAINING where all the
files will be put). When prompted, type the login name "ftp" and your
user name on your home system as a password. Change directory to the
TOBI directory (cd pubTOBI). If you now list the files available
("ls"), you will see several directories as listed below. Users
should feel free to explore these directories.
DOCS - contains documentation files such as this labelling guide.
TARFILES - contains compressed sets of files for easy retrieval.
TOOLS - contain shellfiles and tools for transcription.
Further descriptions of the ToBI ftp site at ling.ohio-state.edu and
some guidelines can be found in the README file. If you run into
serious problems send email to the ToBI site managers at
tobi@ling.ohio-state.edu.
The "Guidelines for ToBI Labelling", and the scripts and other tools
that you will need to look at the examples are stored in a compressed
tarfile called "essentials_tobi_release_3.tar.Z. To get this file,
change to the TARFILE directory ("cd TARFILES"). Since tarfiles are
binary files, enter the command "binary". Now type "get
complete_tobi_release_3.tar.Z". All the utterance files for the
training materials have been placed in one large tarfile and in three
smaller files as described above. These also are stored in the
TARFILES directory. You can transfer the complete set of speech
materials by typing "get complete_tobi_release_3.tar.Z.
Alternatively, to get the smaller files with part of the training
materials only, transfer one or more of the "part" files. You can
take as much as you have room for and then delete things after you are
through with them to make room for the next set of examples and
exercises to work through. Part 1 has the files for Section 1 to
Practice 2, Part 2 has the files for Section 2.6 to Practice 4, and
Part 3 has the files for Section 3.2 to Practice 6. Transfer these by
typing "get part1_tobi_release_3.tar.Z", "get
part2_tobi_release_3.tar.Z", and "get part3tobi_release_3.tar.Z"
respectively. When you have all the files you want, return to your
home system with the command "quit".
Back on your home machine, make sure the tarfiles are in whatever
directory you would like the ToBI files to reside under. The
installation process will create a directory underneath the one in
which you put the tarfiles and all the ToBI material will go in the
new directory. To install the files, first enter the commands
"uncompress essentials_tobi_release_3.tar.Z and "uncompress
complete_tobi_release_3.tar.Z" (or uncompress the relevant
"part" file). Once the files are uncompressed (it will take a few
minutes), enter the command "tar -xvf complete_tobi_release_3.tar"
and "tar -xvf essentials_tobi_release_3.tar (or tar -xvf the relevant
"part" .tar file) to extract all of the subdirectories and files.
Don't forget to delete the tarfile once you're happy that everything
got installed correctly.
Summary instructions are listed below.
1) ftp ling.ohio-state.edu (or ftp 128.146.172.200)
Name: ftp
Password: username on home system 2) binary 3) cd pubTOBITARFILES
4) get essentials_tobi_release_3.tar.Z (guidelines for ToBI Labelling,
scripts, tools) 4a) get
complete_tobi_release_3.tar.Z or, if disk space is a concern, chose a
relevant subset of 4b) get part1_tobi_release_3.tar.Z (Section 1 to
Practice 2)
get part2_tobi_release_3.tar.Z (Section 2.6 to Practice 4)
get part3_tobi_release_3.tar.Z (Section 3.2 to Practice 6)
5) quit
6) uncompress essentials_tobi_release_3.tarZ
6a) uncompress complete_tobi_release_3.tar.Z
or relevant subset of
6b) uncompress part1_tobi_release_3.tar.Z
uncompress part2_tobi_release_3.tar.Z
uncompress part3_tobi_release_3.tar.Z
7) tar -xvf essentials_tobi_release_3.tar
7a) tar -xvf complete_tobi_release_3.tar
or
7b) tar -xvf (relevant part)_tobi_release_3.tar
8) rm essentials_tobi_release_3.tar
8a) rm complete_tobi_release_3.tar
or
(relevant part)_tobi_release_3.tar
The directory structure that you should recover by the untarring all
the files is described below. The top level directory is
TOBI-TRAINING. It includes an ASCII version of this labelling guide
(called "labelling_guide_v3.ASCII"), two subdirectories called
EXAMPLES and PRACTICE (which hold the speech, f0, and label files),
and two script files called "examples" and "exercises". These are
waves(tm) scripts for displaying the examples and exercises. The
latter also is used for labelling the exercises. These two scripts
assume this directory structure. Non-waves(tm) transcriptions of the
examples are given in the ASCII file Nonwaves-transcriptions, and more
detailed instructions about how to use the scripts and useful
shortcuts for the mechanics of labelling using the waves(tm) scripts
are given in README-transcriber.
TOBI-TRAINING
EXAMPLES (where speech, f0, and "answers" are kept)
AND1.breaks (break index label file)
AND1.d (speech)
AND1.f0 (f0)
AND1.misc (misc label file)
AND1.tones (tones label file)
AND1.words (words label file)
. . .
PRACTICE (where user transcriptions are kept)
I-mean.breaks
I-mean.words
I-mean.tones
. . .
examples (script for displaying examples and "answers")
exercises (script for practice labelling)
labelling_guide_v3.ASCII (ASCII version of Guidelines and Conventions)
Nonwaves-transcriptions (ASCII version of non-waves(tm) labelling)
README-transcriber (instructions and shortcuts for scripts)
After you have untarred the files, you will have to type a few
commands to use the utterances as intended. To make the scripts
executable and to protect the speech files and "answer" label files
which are kept in the directory EXAMPLES from being overwritten by
mistake, type the following three commands at the unix command line
from within the TOBI-TRAINING directory.
chmod +x examples
chmod +x exercises
chmod -w EXAMPLES
There are also a few other tools included in the complete tobi release
and the "essentials" file. These are compressed tarfiles which must
be uncompressed and untarred (as above) to be installed. They are not
strictly necessary for working through the Guidelines for ToBI
Labelling, but are helpful tools to have. "cardinals.tar.Z" contains
the files necessary for displaying cardinal examples of ToBI label
categories, "transcriber.tar.Z" contains the files necessary for
transcribers to transcriber their own data, and "checker.tar.Z"
contains the files necessary to invoke the John Pitrelli's checking
program, which checks transcriptions and reports errors.
"cardinals.tar.Z" allows the user to display and play cardinal
examples of ToBI label categories by pushing buttons in a menu
display. If this tool is installed, the button menu with these
examples is called up automatically each time the "examples",
"exercises", or "transcriber" (see next tool description) scripts are
invoked. Read the README-transcriber file for more information.
Install by uncompressing and untarring as described above.
cardinals.tar.Z (for displaying cardinal examples of ToBI)
When uncompressed and untarred yields:
README-transcriber
aux_examples/ (additional examples of ToBI labelling)
cardinals
docard
Labellers who wish to label their own data should use the script
"transcriber" which is included in "transcriber.tar.Z". This uses the
same format as the training materials. Cardinal examples are
available if they are installed. Read the README-transcriber file for
more information. Install by uncompressing and untarring as described
above.
transcriber.tar.Z (for transcribing your own examples)
When uncompressed and untarred yields:
breakindexmenu
miscmenu
tonemenu
transcriber (script for doing transcriptions)
wordsmenu
Labellers should check their transcriptions of their own data to see
that they have a "legal" ToBI transcription. The script
"check-transcription" checks the label files for consistency and
adherence to the ToBI conventions for labelling. Read the
README-checker file for more information. Install by uncompressing
and untarring as described above.
checker.tar.Z (for checking transcriptions)
When uncompressed and untarred yields:
README-checker
check-and-behead-breaks.awk
check-and-behead-misc.awk
check-and-behead-tones.awk
check-and-behead-words.awk
check-transcription (script for checking transcriptions)
check-transcription.awk
0.2.2. Using the digitized utterances and f0 tracks
Now you are ready to use the two waves(tm) scripts. Both of them take
as their argument(s) the basename(s) of the utterance(s) you want to
display. For example, as you are reading about example utterance
<<jam1>> in the labelling guide, you can listen to the
speech and look at the associated transcription by typing:
examples jam1
This will call up xwaves with two data windows to display the speech
waveform and f0 trace, and a third window for the ToBI labels of our
"answer" transcriptions. This script is set up to just display the
information and play the speech; it does not allow the user to change
the labels.
In order to practice transcribing one of the example utterances (say
the first exercise in PRACTICE ONE), type:
exercises amelia-p2
This will display the speech waveform and f0 with only the word labels
and placeholders for the break index labels in the labelling window.
You can then use the labelling menus to add the tones and substitute
break index values for the place holders in the break index section of
the label window.
Both of these scripts can be used to display several utterances in a
series. For example, to queue up the first four examples in the
Guidelines, type:
examples jam1 cough made1 made2
After you have finished looking at <<jam1>>, push the
CONTINUE button in the waves(tm) control panel. The example
<<cough>> will then be the displayed.
A note on where the label files are stored: The "answer" label files
(basename.tones, basename.breaks, basename.misc) are kept in the
directory EXAMPLES along with the speech, f0, and word labels files
(basename.d, basename.f0, basename.words). The label files that the
user creates when labelling with the script "exercises" are stored in
the directory PRACTICE. (Since the "answers are stored in the
EXAMPLES directory, you can check your own labels against those of the
developers of this Guide by calling up the practice basemane using the
examples script.)
A note on multi-transcriber sites: Setting up a site to be a
multi-transcriber site is fairly straightforward. The main idea is
that instead of all labellers working with the "exercises" script and
having their labels saved in the directory PRACTICE, each labeller
will have a personal "exercises" script and directory where the labels
will be stored. Take the name USER as our demonstration (where any
name can substitute for USER).
For each user, make a separate directory within the top level
directory TOBI-TRAINING, copy the "exercises" script to a personal
copy for the user, edit the "user" script, and then invoke the script
"user" exactly as one would invoke the "exercises" script (make sure
the "user" script is executable -- "chmod +x user" if necessary.)
Additionally, one may want to copy the break index placeholders into
the USER directory.
1) mkdir USER
2) cp exercises user
3) Change line in script "user" which says:
PRACTICEDIR=PRACTICE
to say:
PRACTICEDIR=USER
4) cp PRACTICE/*.breaks USER/
5) chmod +x user (if necessary)
6) E.g., start Practice 1 by typing:
user amelia-p2
0.2.3 A less interactive electronic version that can be used on a Mac
Version 2.0 of the labelling guide has been converted to another
electronic format that can be fetched to a Mac for perusal and
playback. The conversion to this format was done by Harald Singer,
and it is available on the Ohio State University Linguistics
Laboratory web site. Go to:
http://www.ling.ohio-state.edu/Phonetics/E_ToBI/singer_tobi.html
0.3. Getting and using the audio recording and paper f0 records
To get an audio tape of the utterances and a printed paper copy of the
labelling guide and f0 tracks, send your request along with a check
for $25.00 made out to The Ohio State University to:
ToBI Labelling Guide, c/o Mary Beckman
Ohio State University, Linguistics Dept.
222 Oxley Hall, 1712 Neil Ave.
Columbus, OH 43210-1298 USA
(The $25.00 just barely covers the cost of making copies of the tape
and booklet and of the mailing to a North American location.)
On the audio tape each utterance is played twice in a row. The
utterances occur in the order that they are listed in the Labelling
Guide text, with two more repetitions if an utterance is mentioned
again later in another section. However, it is far easier to use the
utterances if you can play each one as many times as you want and if
you can zero in on some section of an utterance at will, and we
recommend that you find some way of doing so. For example, you might
use a tape recorder with a recordable tape loop device (and loops of
several lengths). Or if you have a Kay DSP 5500 or some other kind of
computer system with fast A/D capabilities, digitize each utterance
from the tape into a buffer where you can play repeatedly while
looking at the paper record of the f0 contour and the accompanying
labels.
To transcribe the example utterances, you will need to use the
non-waves(tm) conventions described in Section 9 of The ToBI
Annotation Conventions. The last section (Appendix C) of the booklet
is a listing of an ASCII file containing the non-waves(tm) format
labelling of each example and exercise utterance corresponding to the
waves(tm)-format labelling displayed on the sheet with the f0 contour.
They are given in alphabetical order by basename. This ASCII file and
another ASCII file containing the orthographic labels and field
placeholders of all of the exercise utterances file can be obtained by
anonymous ftp from the Onio State University Linguistics Laboratory by
doing the following:
On your home system, enter the command: "ftp ling.ohio-state.edu" (or
use the internet address for ling, which is 128.146.172.200). When
prompted, type the login name "ftp" and your user name on your home
system as a password. Change directory to the TOBI directory (cd
pubTOBI). If you now list the files available ("ls"), you will see
several directories:
DOCS - contains documentation files such as this labelling guide.
TARFILES - contains compressed sets of files for easy retrieval.
TOOLS - contain shellfiles and tools for transcription.
To get them, change to the DOCS directory ("cd DOCS") and transfer the
files by entering the commands "get Nonwaves-transcriptions" and "get
Nonwaves-exercises-templates". When the transfers are complete,
return to your home system with the command "quit".
0.4. Future editions and a disclaimer
If you have comments on this Labelling Guide -- particularly, if you
have suggestions for improvements or better example utterances you
would like to give to us -- we would be very grateful if you would
direct the commments to us at:
e-mail: tobi@ling.ohio-state.edu
other-mail: ToBI Labelling Guide, c/o Mary Beckman
Ohio State University, Linguistics Dept.
222 Oxley Hall, 1712 Neil Ave.
Columbus, OH 43210-1298 USA
This e-mail address is also the place to send us your e-mail address
if you want to be added to our list of "subscribers" to be notified of
any future editions of the Labelling Guide.
The ToBI labelling system was originally developed to cover the three
most widely used varieties of spoken English -- namely, general
American, standard Australian, and southern British English. We do
not claim to cover other varieties. Indeed, we have already
determined that ToBI proper does not adequately cover many other
British varieties such as the Glasgow dialect, and modified variants
need to be developed by users who want to use it in transcribing
utterances in these other dialects. By the same token, we must stress
that ToBI was not intended to cover any language other than English,
although we endorse the adoption of the basic principles in developing
transcriptions systems for other languages, particularly languages
that are typologically similar to English. More general comments
about using the ToBI system for other dialects of English or about
adapting ToBI labelling principles to develop comparable systems for
the transcription of other languages may also be addressed to the tobi
e-mail address listed above for forwarding to appropriate interested
members of the larger ToBI group.
labelling_guide_v3.ASCII (augmented by some HTML)
This page is maintained by M. Beckman (mbeckman@ling.osu.edu)
|