4. The miscellaneous tier (and other aspects of the marking of disfluencies)

4.1. The miscellaneous tier defined

The miscellaneous tier is in essence a `comment' tier for the optional
marking of events of any kind other than the standard words, tones,
and disjunctures marked on the orthographic tier, the tone tier, and
the break index tier.  Many of the events labelled on the miscellaneous
tiers are things that span longish intervals.  In this, miscellaneous
events are like the word events labelled on the orthographic tier.
However, the two types of events are very different, in that a strict
succession of miscellanous events is not essential to speech, whereas
speech must be a succession of produced words (or pieces of words).
Therefore, whereas the ToBI convention is to mark each event on the
orthographic tier only at the end of the interval that the event
spans, it prescribes that an event on the miscellaneous tier should in
general be marked for both its end and its beginning, using the
diacritics `>' and `<', respectively.  Thus labels on the
miscellaneous tier usually come in pairs, such as:

breath< breath> laugh< laugh> cough< cough>

Example <<cough>> in Section 1.1 illustrated the use of the
miscellaneous tier to mark the cough that interrupts the utterance.

EXAMPLE <<cough>>: Will you have marmalade ...
                        L*        L*
                       1   1    1         1p
                                 cough<            cough>

Another similar example is the laughter that interrupts the pitch contour in
utterance <<laugh>>.

EXAMPLE <<laugh>>: To me  it this seems very obvious;   to make it on
                     1  3   1    1     1    1       4     1    1  1  3
                      H* L-                L+H*    L-L%     H*     !H* L-
                                     laugh<     >laugh laugh<
                   to make it by hand is much more fun than to make it on
                     1    1  1  1    3  1    1    1   1    1  1    1  0  1
                       H*       L+H* L-   H*
                             >laugh
                   a computer.
                    1        4
                             L-L%
[GIF}

Since such markings are useful for parsing the disruption of otherwise
tonally well-formed intonation contours, we can think of them as a
source of `disfluency'.  Indeed, the ToBI Annotation Conventions
encourage the marking of disfluencies, and suggest the use of
`disfl<...disfl>' (or `disfl') as a general flag for them:

  In general, it is the assumption of the participants in the common
  transcription group that silences should be automatically
  detectable, at least to a first approximation, and that transcriber
  time should not be spent marking these by hand.  Disfluencies, by
  contrast, are not automatically detectable, and the absence of
  markings for them makes it difficult to parse the tone and break
  index tiers.  For these reasons, transcribers are urged to mark
  disfluencies on the miscellaneous tier using `disfl<' and `disfl>'
  (or `disfl' if the disfluency is extremely localized), and to
  provide these marks in the miscellaneous tier menu when using
  waves(tm)).
 
However, it is often easier to determine that something is disfluent
in some region than it is to determine exactly where the disfluency
begins and ends.  For this reason, the ToBI Annotation conventions
specify that the marks can be used more like a disfluency flag rather
than the demarcation of a precise region:

  ...the marks `disfl<' and `disfl>' (or simple `disfl') should be
  interpreted as rough pointers to the disfluent region and
  transcribers should not agonize over placing them precisely.

Note that here the ToBI Annotation Conventions explicitly mention the
use of a single mark, rather than a pair of marks for the beginning
and end of a region.  However, they specifically recommend this usage
only for disfluencies, to encourage the marking of something that is
typically very difficult to locate precisely in time.  Transcribers
should be careful about using a single (unpaired) label on the misc
tier for anything other than marking the general location of a
perceived disfluency, since in any other circumstance, the usual
interpretation must be that the event is so localized that its
beginning is virtually the same point as its end.

Example utterance <<fare>> is an example of a disfluency marked
in this way.

EXAMPLE <<fare>>: show me the cheapest fare    from Da- from
                      1  1   1        1    4       1  1p    1
                    H*        L+H*     !H* L-L%        %r
                                                 disfl< disfl>
                  Philadelphia to Dallas     excluding    restriction
                              3  1      4             4           4
                 L+H* !H*     L-  H*    L-L%  L+H*   L-L%      H* L-L%
                  v   u    slash one
                    1  3        1   4
                  H* !H* L- H*   H* L-L%
[GIF}
[GIF}

Although the miscellaneous tier is a general-purpose `comment' tier,
we recommend that when transcribers at a particular site find
themselves often adding comments that fit some particular pattern
other than these, they consider defining another extra tier for that
purpose.

Christine Nakatani and Elisabeth Shriberg, both of whom have worked
extensively on disfluencies in naturally spoken utterances,
differentiate more finely, and suggest guidelines for other
transcribers who wish to differentiate types of disfluencies in the
same way.  The following section is adapted from their suggested
guidelines, and uses many of their examples.


4.2. Suggested guidelines for marking disfluencies

Nakatani and Shriberg have identified several different types of
events that they feel should count as disfluencies in ToBI.  Not all
of these need be marked on the misc tier in order to be recovered.  In
particular, mere hesitation pauses can be recovered from the use of
the 2p or 3p marks on the break index tier and (in the case of many
filled pauses) from the transcription of the filler material on the
orthographic tier.  Phenomena that might be flagged as disfluencies on
the misc tier include such phenomena as stumbling over a word, or
abruptly cutting off a word or phrase in midstream to make a fragment,
as in <<fare>> cited above, or <<transport>> below.  These are
examples of the first of the major classes of disfluency which
Nakatani and Shriberg identify, including what they call `phonetic
error'.

EXAMPLE <<transport>>: show ground transpor- ground transportation
                           1      1        1p      1              4
                               disfl:repair<
                                 disfl:repair>
                       at  atlanta
                         2p       4
[GIF}
[GIF}

The second of the three major classes is the hesitation pause.  This
includes both silent pauses as in the examples transcribed with 2p
above, and filled pauses -- that is hesitation intervals during which
the speaker holds the floor by producing hesitation noises or other
material, as in <<weight>>.

EXAMPLE <<weight>>: The weight on a six on a seven sixty seven is
                       1      0  1 1   2p 1 1     2p    1     3- 2p
                    three thousand uh three hundred and twelve
                         1        2p 4     1       1   1      1
                    thousand pounds uh is that including passengers
                            2p     2p 4  1    1         3          4
[GIF}
[GIF}

Nakatani and Shriberg recommend that the spelling of hesitation noises
be standardized so that later users of a ToBI transcribed database
need search only for a limited set of `words' in recovering the
disfluency.  In particular, for standard American English, they
recommend the use of only "um", "uh", or "mm".  That is, transcribers
should not invent other spellings such as "ah" or "uhhhh" to reflect
differences in the quality of the reduced vowel or the duration of the
syllable.  With this stipulation, filled pauses of this sort would not
need to be flagged on the misc tier, since they would be recoverable
from the orthographic tier.

A filled pause may be perceived as unaccented, and yet as constituting
its own intermediate or intonational phrase.  Normally each intonation
phrase is required to have at least one pitch accent.  In the case of
filled pauses this criterion is relaxed; an unaccented filled pause in
its own phrase can be labelled with the phrase accent (chosen from the
full inventory) without the requirement that a pitch accent be marked
on the filled pause.
  
The last major class of disfluency is the class of repairs and fresh
starts, which Nakatani and Shriberg define as "lexical
self-corrections of parts of sentences and whole sentences,
respectively".  They give us utterance <<fare>> as an example of
a repair, and <<connections>> as an example of a fresh start.  (Here
we have used the misc tier to mark these interpretations of the
disfluencies.)  These two examples also illustrate abrupt cutoffs
resulting in word fragments.

EXAMPLE <<fare>>: show me the cheapest fare from Da- from
                      1  1   1        1    4    1   1p   1
                                                repair<
                                                   repair>
                  Philadelphia to Dallas excluding restriction
                              3  1      4         4           4 
                  v u slash one
                   1 3     1   4

EXAMPLE <<connections>>: What are the plane sizes for these flights and
                              1  1   1     1     4   1     1       4   1
                          do they ha(ve)- do are there any other flights
                            2p   1      1p  1p  1     1   1     1      1
                              restart<
                                       restart>
                          that have s-  connections
                              1    1  1p           4

More detailed suggestions about how to flag repairs can be obtained by
writing directly to Christine Nakatani (chn@das.harvard.edu) or
Elizabeth Shriberg (ees@speech.sri.com).



*********************************************************** 
PRACTICE SIX: break index 2, the p diacritic, disfluencies
*********************************************************** 
Transcribe these exercises using the exercises script.

_______________________________________________________________________
EASY:
EXERCISE <<park5>>: 	Uh and then I go under a footbridge and into the park.
EXERCISE <<business>>: 	A lot of people have done this; they sell their
                       	business, and they have... If something goes wrong,
                       	and they have the first rights to buy it back.
     [Repeated exercise from PRACTICE THREE. Transcribe the phrase "and
      they have,..."] 
EXERCISE <<howto>>: 	I know we've gotta do it but I don't know how to do it.
     [Repeated exercise from PRACTICE FOUR.]
_______________________________________________________________________
INTERMEDIATE: 
EXERCISE <<mean>>:	Because I I mean, to make a map on computer
			is not n- nearly as much fun.  
EXERCISE <<semester>>: 	The advisor to f- fill out my schedule for the first
                        semester said "Why don't you take Introduction...
                        Intro...  Introductory Linguistics."  
EXERCISE <<spoon2>>: 	There's a spoon in here.
     [Compare <<spoon1>> in PRACTICE TWO.]  
EXERCISE <<author>>: The author of more than eight hundred state supreme
          court opinions (Hennessy is widely respected for his legal
          scholarship and his administrative abilities.)
     [This is the first part of <<hennessy>> in Section 2.10.]  
EXERCISE <<usually-not>>: Usually not, no.  Nah.  Usually they won't give
          		  you chances.
_______________________________________________________________________
DIFFICULT:
EXAMPLE <<tuition>>: My learning experiences are on the job, so when I
                     screw something up instead of s- spending all
                     this money to go to college...  When I screw up a
                     job, that's my tuition for college. That's exactly,
                     exactly how it works, there's no difference at all.
EXERCISE <<fail>>: And what happens is: when you...  when you buy my
                     business, and you try to run my business, it's
                     really hard for you to run my business.  So a
                     lotta times they fail.
     [Repeated exercise from PRACTICE THREE. You've transcribed most of
      the tones already.  Now you're ready to worry about the break
      indices, particularly those around the first "when you..." and
      "when you try to run my business".]
EXERCISE <<figureout>>: Half the job is accomplished by just starting it.
          [Interviewer: Mm-hmm] So just start doing it, and you'll figure
          it out. [Interviewer: Yeah] You know what I mean?


labelling_guide_v2.ASCII (augmented by some HTML)


This page is maintained by M. Beckman (mbeckman@ling.osu.edu)

 

Copyright © 2008 Department of Linguistics, The Ohio State University
Questions? see our Contacts page.
To report problems with this web site, contact webmaster@ling.ohio-state.edu
Global Hits: 14543660