3. More on the break index tier

3.1. The break index tier relative to other tiers

The other core part of the prosodic transcription proper is the break
index tier.  If we think of the tone tier as a marking of the speech
signal mediated primarily by our interpretation of the analysis of the
f0 contour, the analogous way to think of the break index tier is as a
marking of the speech signal as mediated primarily by the rhythmic and
segmental analysis implicit in the orthographic tier.  The summary
statement of ToBI conventions describes this relationship as follows:

  Break indices represent a rating for the degree of juncture
  perceived between each pair of words and between the final word
  and the silence at the end of the utterance.  They are to be marked
  after all words that have been transcribed in the orthographic
  tier.  All junctures -- including those after fragments and filled
  pauses -- must be assigned an explicit break index value; there is
  no default juncture type.

Thus, the events on the break index tier are labels of the utterance's
prosodic grouping -- that is, each label denotes a boundary of some
kind of constituent which ends at the word that the transcriber has
marked on the orthographic tier.  The convention for placing break
index marks in a waves(tm) label file is that the number should be
associated with a point in time at the end of the marked word as
indicated by the label in the orthographic tier.  It should be located
exactly at, or slightly to the right, of this word marker, so that
break indices can be unambiguously associated with other tiers.

There are 5 break indices, numbered 0 through 4, roughly in order of
lesser to greater degree of perceived separation between the marked
word and following material.  The break indices are meant to be a
label of the SUBJECTIVE strength of the boundary.  However, this does
not mean that there are no objective criteria for marking the
boundaries, or that the five labels form a uniform five-point scale.
For example, the lowest-level break index (0) is defined in terms of
connected speech processes, such as the flapping of word-final /t/ and
/d/ before a following vowel-initial word in many American and
Australian dialects, processes that prosodically group words together
into `clitic groups' -- larger compound-word-like constituents above
the level of the word (see Section 3.2).  At the other end of the
scale, the two highest break indices (3 and 4) are defined in
relationship to the prosodic constituents (intermediate phrases and
intonation phrases) that are assumed by the marking of phrase accents
and boundary tones on the tone tier (see Section 3.3).

Mainstream phonological theory might lead us to expect that these
intonational constituents and the lower-level clitic group
constituents will form a strict hierarchy (see Selkirk, 1980; Nespor &
Vogel, 1986).  The numerical scale of break index values reflects a
mild bias in favor of such strictly hierarchical models (see the
discussion in Price et al.  1991).  Rather than building the
expectation rigidly into its transcriptions, however, ToBI provides
two regular mechanisms for denoting mismatches between different cues
to subjective boundary strength.  First, break index 2 denotes a
mismatch between the constituency prescribed by the tonal
transcription and the sense of disjuncture due to pauses and
pause-like phenomena (see Section 3.4).  Second, there is a diacritic
`p' that can be appended to break indices 1, 2, and 3 to convey some
sort of prosodic disfluency -- for example, an abrupt cutoff after a
false start or a perceptible prolongation or pause which sounds as if
the speaker were hesitating while searching for the next word (see
Section 3.5).  These two provisions should allow transcribers to avoid
the circularity of basing a theory about the nature of the prosodic
hierarchy upon the transcription of databases that might be used to
explore such issues as the relationship between intonational
constituents and pause (see, e.g., Woodbury, 1993, who proposed that
pauses can be placed independently of intonational boundaries when the
discourse structure requires the indication of competing segmentation
strategies for topic structure versus rhetorical structure).


3.2. Break indices 0 and 1

Except in more deliberate speech styles, such as the
information-packed style of radio news announcers, the break index
value that will be encountered most frequently is probably 1.  The
ToBI conventions define break index 1 negatively, as the label to be
used for "most phrase-medial word boundaries", as contrasted with the
marked phrase-medial cases transcribed by break index 0.  Break index
0, conversely, is defined with positive criteria as the value "for
cases of clear phonetic marks of clitic groups; e.g. the medial
affricate in contractions of `did you' or a flap as in `got it'."
Since the other break indices are also defined by positive criteria
(markings on the tone tier -- see Sections 3.3 and 3.4), we can think
of break index 1 as the `default' (although, of course, there is no
real default index in the sense of having a value that need not be
marked because it is understood).  We have already seen many examples
of break index 0 in previous example utterances.  For example, in
example <<understand>> in Section 2.10 above, there are three cases of
0 break index: the flapped /t/ on the two instances of the word "to"
after "trying" and "you" and the palatalization of the /t/ at the
juncture between "get" and "you" all are examples of connected-speech
processes that we take as criteria for break level 0.

EXAMPLE <<understand>>: I'm simply trying to get you to understand.
                           1      3      0  3   0   0  3          4
[GIF}

Example utterance <<kinds-v>> illustrates yet another such
connected-speech process: the apparent deletion of the vowel in "of"
after "kinds", to make a phonotactically impermissible /zv/
word-final cluster.

EXAMPLE <<kinds-v>>: What kinds of planes...
                         1     0  1      4
[GIF}

Note that in some cases the phenomena denoted by break index 0 are so
frequently encountered in particular types of sequences, that
orthographic conventions have developed for marking them.  For
example, the flapping of the /t/ and consequent cliticization of the
word "to" onto a preceding auxilliary verb "got" is sometimes
indicated in writing by "gotta".  Or the deletion of the initial /h/
and vowel of "have" in sequences such as "would have" can be indicated
by spelling it "would've".  In such cases, the transcriber has the
alternative of marking the prosodic grouping by the choice of label on
the orthographic tier instead.  For example, by labelling the word as
"gotta" rather than "got to" the transcriber has eliminated the word
boundary where a 0 label might be placed on the break index tier.


3.3. Break indices 3 and 4

Break indices 0 and 1 form a natural progression with indices 3 and 4.
These two break index strengths are equated with the intonational
categories of intermediate (intonation) phrase and (full) intonation
phrase.  Thus, whenever the tonal analysis indicates a L- or H- phrase
accent, the transcriber should decide where the end of the
intermediate phrase marked by this tone label is and place a 3 on the
break index tier to align with the orthographic label for the last
word in the intermediate phrase.  Similarly, whenever the tonal
analysis indicates a L% or H% boundary tone, the transcriber should
place a 4 on the break index tier at the end of the last word in the
intonation phrase.  In actuality, the ordering of these two analyses
is sometimes reversed.  This is particularly the case with the L%
boundary tone; the transcriber might be convinced of the percept of a
4 versus a 3 level boundary before deciding that there must be a L-L%
or H-L% sequence as opposed to merely a L- or H- to be marked on the
tone tier.  Recall from the discussion in Section 2.3 that there may
be little or no difference in f0 values between the end of a mere L-
and a L-L% sequence or between a mere H- and a H-L% sequence; A L%
following a L- is in the bottom of the speaker's pitch range just as a
L-, whereas a L% following a H- is upstepped to the same level as the
preceding phrase accent.  In such cases, the analysis is necessarily
more subjective; the transcriber must rely on the percept of degree of
disjuncture with less help from the f0 contour.  Some pertinent
examples from earlier sections are repeated here.

EXAMPLE <<names>>:
    Anna may know my name, and yours too.     Anna may know our names?
    H*                L-H%      H*    H* L-L% L*                  H-H%
        1   1    1  1    4    1     1   4         1   1    1   1     4
[GIF}

EXAMPLE <<park2>>: Definitely the shortest and probably the pleasantest
                        H*     L-   H*    L-L%  H*         L-  H*
                             1   3        1   4        1   3           1
                        way to go is through the park.
                                 L-             L+H* L-L%
                           0  1  3  1       1   1    4 
[GIF}

EXAMPLE <<oregano>>: 	1) Let's see   I need oregano 'n marjoram 'n some
                           H*   H* L-L%        L*  H-    L*     H-
                                1   4   1    1       3  0        3  0  1
                           fresh basil okay?
                          L+H*  !H* L-  H* H-H%
                                1     3    4
                        2) Oh I don't know    it's got oregano 'n marjoram
                          H* !H*      !H* L-L%          H*  H-    H*   H-
                             1 1     1    4       1   1       3  0     3
                           'n some fresh basil.
                                         H* H-L%
                             0    1     1     4
[GIF}

EXAMPLE <<nose>>: Oh    don't nuzzle me you marmalade-nose.
                  X*? L- H*   !H*   L-     L*           L-H%
                      3      1      1  3   1         1    4
[GIF}

When using waves(tm) label files, a 3 or 4 break index label and the
corresponding phrase accent or boundary tone are placed together at
the orthographic label, with the break index label coming last if the
labels on the three tiers cannot be absolutely synchronized.


*********************************************** 
PRACTICE FIVE: break indices 0, 1, 3, and 4 
*********************************************** 
You have already transcribed the tones on the following.  Now
transcribe the break indices. 
_______________________________________________________________________
EASY:
EXERCISE <<manitowoc>>: Does Manitowoc have a bowling alley?
     [See PRACTICE TWO for tones.]  
EXERCISE <<butcher>>:   How'd your operation go?  Don't talk to me about it;
                        I'd like to strangle the butchers.
     [See PRACTICE FOUR for tones.] 
EXERCISE <<stalin>>: 	I was wrong, and Stalin was right.  I was wrong.
     [See PRACTICE TWO for tones.]
EXERCISE <<flour2>>: 	Oh nothing special, you know flour and butter
			and sugar.
     [See PRACTICE TWO for tones. Transcribe just the second part,
      after the "you know".]  
EXERCISE <<thought>>: 	That's what I thought.
     [See PRACTICE ONE for tones.]
_______________________________________________________________________
INTERMEDIATE: 
EXERCISE <<I-mean>>: 	You know what I mean?
     [See PRACTICE ONE for tones.]
EXERCISE <<noodle1>>: 	We have a lean mini-noodle with beans.
                        Well, we have a lean mini-noodle dish.
     [See PRACTICE THREE for tones.]
EXERCISE <<knock-stuff>>: Mostly they just sat around and knocked stuff.
                          You know, the school, other people.
     [See PRACTICE TWO for tones.]
_______________________________________________________________________
DIFFICULT: 
EXERCISE <<argument>>: 	If he can then there's no argument about it.
                        (two productions)
     [See PRACTICE THREE for tones.] 
EXERCISE <<artwork>>: 	State law now requires public construction projects
                        to set aside 1% of their budgets for artwork.
     [See PRACTICE FOUR for tones.]
EXERCISE <<anyway>>: 	But anyway, if you can't see that then I don't
                        know if I can explain it to you.
     [See PRACTICE ONE for tones.]

3.4. Break index 2

As noted in the previous section, each 3 on the break index tier must
correspond to the marking of a phrase accent for the intermediate
phrase on the tone tier, and each 4 must correspond to the marking of
a boundary tone.  The implication is that any other interword juncture
will be something that can be transcribed on the break index tier with
either a 0 or a 1.  However, the subjective impression of boundary
strength does not always allow such a neat correspondence.  In the
course of developing the ToBI transcription system, we encountered
several utterances in which we felt a strong sense of disjuncture at a
boundary between two words where the pitch pattern showed no evidence
of the necessary tonal events for either of these two levels of
intonational constituency.  We also encountered the converse case:
utterances in which the pitch pattern at a boundary between two words
clearly indicated an intermediate or intonation phrase boundary with
none of the preboundary lengthening or other cues that support the
subjective sense of a strong disjuncture.  Break index 2 was devised
to mark cases of these two types of `mismatch' between the subjective
boundary strength and the intonational constituency.  These two types
are described in the ToBI Annotation Conventions as follows:

    a strong disjuncture marked by a pause or virtual pause, but with
    no tonal marks; i.e. a well-formed tune continues across the
    juncture.
        OR
    a disjuncture that is weaker than expected at what is tonally a
    clear intermediate or full intonation phrase boundary.

Example utterance <<iraqi>> illustrates the first type of mismatch,
and example utterance <<quincy>> illustrates the second.  In
<<iraqi>>, the smooth sequence of apparent downstepped peak accents
with no clear intervening phrase accent suggests that the words "six",
"southern", "iraqi", and "cities" all belong to the same intermediate
phrase, yet there is an intonation phrase sized pausing between each
adjacent pair of these words.  In <<quincy>>, the clear tonal markings
for at least an intermediate phrase boundary are unaccompanied by any
clear preboundary lengthening, making some transcribers uncomfortable
in labelling this juncture with a 3.

EXAMPLE <<iraqi>>: The Pentagon reports fighting in six southern
                       L+H*          L-  L* H-H%    H*   !H*
                      1        1       3        4  2   2        2
                   iraqi cities.
                    !H*  X*? L-L%
	                2      4
[GIF}

EXAMPLE <<quincy>>: uh Quincy.  Could I have the number to uh
                        H*  L-        H*         !H*       L-L%
                      4      2       1 1    1   1      1  1  4
                    Shore Cab.
                     *?     H* H-L%
                         1   4
[GIF}

Break index 2 was devised for cases where the mismatch between the
tonal marking and the disjuncture is not accompanied by any sense of
hesitancy or disfluency.  When 2 is used in the first way (to indicate
a stronger sense of disjuncture than 1 even while producing a coherent
contour for an uninterrupted intermediate phrase), it can have the
rhetorical effect of careful deliberation, as in the <<iraqi>>
example.  In the opposite case (when 2 is used to mark intermediate
phrase boundaries which do not have a very strong sense of
disjuncture) the speaker may be speaking quickly to hold the floor or
to convey a sense of urgency, while using the tonal marks necessary to
convey attentional focus on several closely placed words.  We suspect
that both types of 2 will be explained ultimately by a better
understanding of the complexities of discourse structure, an
understanding that can best be achieved by the transcription and
analysis of many occurrences in natural dialogue.


3.5. The p diacritic (and the %r tone label) [Christine Nakatani and
Elizabeth Shriberg contributed greatly to the preparation of this and
the following sections.]

There are other cases of mismatch between tone tier and segmental
rhythm, however, where break index 2 does not seem to be appropriate.
For example, in utterance <<display>>, the pauses after "Baltimore",
"which", and "leave" do not have the feel of a speaker striving for an
effect of judicious deliberation, as in the "six southern Iraqi
cities" phrase of the <<iraqi>> example, but rather sound disfluent,
as if the speaker were hesitating as he searches for the next word.
Such cases can be distinguished from fluent cases of 2 by the use of
the p diacritic.

EXAMPLE <<display>>: Display all the flights from Baltimore to Dallas
                            1   1   1       1    3-        2p 0      4
                     which leave after 4:00 p.m.
                          2p    2p    3p   2p   4
[GIF}
[GIF}


The p diacritic is used in conjunction with a break index 1, 2, or 3,
to indicate a disfluency in the timing or separation of words across a
break.  The notation `p' was chosen initially to denote the
prolongation of the hesitation pause with break indices 2 and 3, but
we have since extended the diacritic's usage to cover also abrupt
cutoffs before restarts and repairs, which are often but not
necessarily separated from the disfluent stop by a pause.  In this
case, the appropriate break index is 1.  Thus the inventory of
combinations of break index and p diacritic is:

  1p -- an abrupt cutoff before an actual repair, or as if stopping to
        permit a repair or restart of some kind
  2p -- a hesitation pause or prolongation of segmental material where
        there is no phrase accent perceived in the intonation contour
  3p -- a hesitation pause or a pause-like prolongation where there is
        a phrase accent in the tone tier.

The p diacritic is not used with break index 4, because it is
difficult to reliably identify hesitations between two full
intonational phrases.  Example utterances <<amazing>> and <<cheapest>>
illustrate the use of the diacritic with break indices 1 and 3.
Example <<display>> also had an occurrence of 3p.  Note the presence
of the phrase accent distinguishing this interword juncture from the
surrounding cases of 2p.

EXAMPLE <<amazing>>: 	um But I had I mean the stuff he knows is kind of
                          0   0 1p  3 1    4   1     1  1     4  1    1 1
                       	amazing 'coz he does a lot of um environmental
                               3    1p 1    1 X   0  1  4             1
                       	impact stuff
                              2p    4
[GIF}
[GIF}

EXAMPLE <<cheapest>>: I want to see the cheapest flight from Atlanta
                       1    1  1   1   3p       1      1    1       3
                      to Baltimore
                        1         4
[GIF}

In general the p diacritic should be used conservatively, and should
not become a substitute for 2.  A good test for appropriateness is to
imagine whether the break would have been the same if the speaker were
asked to repeat the utterance with the same intonation, but more
`fluently'.  If the break were the same upon repetition, it should
probably not get the p.

Note also that the prolongation of segmental material for a 2p label
can physically occur at the beginning of a word rather than at the
end, as in example <<least>>, where the hesitation lengthens the [l]
of "least" rather than the vowel of "the".

EXAMPLE <<least>>: Between Boston and Denver I'd like to a flight that
                          3      1   1      4   1    1  3p 1     1    1
                   takes the least amount of stops to get to Boston
                        3p  2p    1      0  1     4  1   1  0      4
[GIF}
[GIF}

Closely associated with these definitions of 1p, 2p, and 3p in the
break index tier is the tone tier label %r, for restarting with a
brand new intonation contour when the the last contour was interrupted
without being finished by some disfluency.  This is most common at a
`repair', where the speaker abruptly stops and begins again with the
intended or `repaired' material, as in example utterance <<amazing>>,
already cited above, and in example <<connections>>, below.

EXAMPLE <<amazing>>: 	um But I had    I mean     the stuff he knows
                          0   0 1p   3   1     4      1     1  1     4
                               H* H* L- H* !H* L-L%      H*      !H* L-L%
                       	is kind of amazing 'coz    he does a lot of um
                          1    1  1       3    1p    1    1 X   0  1  4
                                   L+H*  L-      %r H* !H*            L-L%
                       	environmental impact stuff
                                     1      2p    4
                           H*         H*         H-L%

EXAMPLE <<connections>>:  What are the plane sizes   for these flights and
                              1   1   1     1     4     1     1       4  1
                            H*           L*      H-H%      H*    H*  H-L%
                          do they ha(ve)- do  are there any other flights
                            2p   1      1p  1p   1     1   1     1      1
                                             %r          H*        !H*
                          that have s-    connections
                              1    1  1p             4 
                                        %r    H*    L-L%
[GIF}
[GIF}

As with the use of the p diacritic, one should be conservative in
using %r.  It is needed only if there is good evidence that a new
intonational phrase has begun after disfluent pause, evidence such as
a notable change in f0 range or amplitude.  It should not be used in
cases such as the "had" after the first 1p in <<amazing>>, which
continues with a fluent H* accent in the same pitch range (unlike the
H* !H* on "he does" after the second 1p in this utterance, which is in
a new pitch range).  Nor should %r be used in example utterance
<<abbreviation>>, where after the speaker stumbles and pauses
momentarily around the end of "what is the", the intonation on
"abbreviation" continues as if there had been no interruption.

EXAMPLE <<abbreviation>>: What is the b-  abbreviation   n       under
                              0  1   1  1p            3-      3      3p
                                              H*      H- L+H* L- H* !H-
                          the category d c  mean
                             1        1 1  1    4
                               H*     H* H*    L-L%
[GIF}

Especially, %r should not be used after a 3p, where the (re)start of a
new intonation contour is already implicit in the break index for the
intermediate phrase.


3.6. Ordinary uncertainty.

In addition to these two well-defined types of `uncertainty' due
either to conflicting evidence about boundary strength (break index 2)
or to the interruption of fluent prosodic production at repairs and
hesitations (the `p' diacritic), there will be cases of ordinary
garden-variety uncertainty for other reasons.  For example, (as we
have already discussed above in Section 2.3) the f0 contour for an
utterance-medial intonation phrase that ends with a L% boundary tone
is often difficult to distinguish from a mere intermediate phrase.  In
such cases, where the transcriber cannot decide from other cues
whether the tonal analysis should be L- versus L-L% (or H- versus
H-L%), the break index marking is also necessarily ambiguous.  The
ToBI conventions prescribe that in such cases of transcriber
uncertainty, the higher-level boundary should be chosen, and
uncertainty marked by appending the `-' diacritic.  Thus, in <<park2>>
given above in 2.3, if no decision can be made between L- and L-L%,
the correct break index marking is `4-'.

The same convention applies at lower levels of the hierarchy.  For
example, if the transcriber thinks that a word-final /d/ has been
pronounced as a flap, joining the word it ends into a close prosodic
unit with the following word, but is not certain that it is a flap and
not just a rather short [d], then the correct break index marking is
`1-'.  A similar case involving /t/ is given in example utterance
<<democrat>>.  Here it is not clear whether the /t/ at the end of the
word "democrat" has been flapped, or not released.

EXAMPLE <<democrat>>: The chairman, Wendell Ford,  democrat of Kentucky...
                           L+H* L-  L+H*    !H* L- H*          L+H* L-L%
                         1        3        1    3          1- 1        4
[GIF}

Examples <<rewarding>>, <<noodle2>>, and <<noodle3>> illustrate cases
where tonal sequences evident in the pitch contour might seem
compatible with several alternative analyses, some with and some
without a medial intermediate phrase break.  When such utterances are
transcribed outside of their larger discourses, these contours might
be highly ambiguous.

EXAMPLE <<rewarding>>: 	A really rewarding day.
                          L+H* L-          H* L-L% 
[GIF}
EXAMPLE <<noodle3>>: 	We have a lean mini-noodle dish.
                                L+H* L-    L+H*     L-L%
       (compare <<noodle2>> given above in PRACTICE THREE)
[GIF}

The minus symbol associated with uncertainty in break index value
cannot be used in conjunction with the p diacritic.  Uncertainty about
whether or not to use the p should be conveyed by using `p?'.

labelling_guide_v2.ASCII (augmented by some HTML)


This page is maintained by M. Beckman (mbeckman@ling.osu.edu)

 

Copyright © 2008 Department of Linguistics, The Ohio State University
Questions? see our Contacts page.
To report problems with this web site, contact webmaster@ling.ohio-state.edu
Global Hits: 14543556