2. More on the tone tier

2.1. Tones and fundamental frequency

As noted above, one of the basic parts of a ToBI transcription for an
utterance is an electronic or paper record of the fundamental
frequency contour.  The transcription of events on the tone tier is
closely linked to this record.  In the case of pitch accents, the
labeller can make this link explicit by choosing to place the label
for the pitch accent specifically at the f0 maximum or minimum that
realizes the starred tone of the accent, if this f0 event is within
the interval of the accented syllable nucleus.  (If the maximum or
minimum does not actually occur within the syllable nucleus, there are
optional conventions for marking the maximum or minimum as well as the
accented syllable using the symbols `<' and `>', for a late or early
f0 event, respectively -- see Section 4.2 in "The ToBI Annotation
Conventions".)  There is a more practical connection, as well,
inasmuch as most transcribers find the fundamental frequency contour
an invaluable aid in making the analysis of the intonation pattern
that is embodied by the transcription on the tone tier.

In interpreting the f0 contour to make the tonal transcription, it is
important to keep in mind that several non-tonal aspects of an
utterance can also strongly influence the fundamental frequency
pattern.  One of the most ubiquitous of these influences is the way in
which consonant segments in the utterance interrupt the smooth course
of the f0.  Voiceless stops such as [p] and [t] and voiceless
fricatives such as [f] and [s] create `holes' in the f0 contour just
by being voiceless.  Moreover, it is not possible usually to read the
intended pitch during a voiceless consonant by interpolating from the
last f0 value before voice offset to the first f0 after voice onset
because obstruent consonants (stops, fricatives, affricates) all cause
dramatic perturbations in the fundamental frequency contour over and
above any interruption of voicelessness per se.  As an `intrinsic'
characteristic of its voiceless specification, a voiceless obstruent
is usually associated with a dip into the consonant constriction and a
dramatic fall starting from a much higher frequency just after the
consonant release.  Even voiced obstruents disturb the f0 contour; a
voiced stop or fricative can be associated with a fall into and rise
out of an often quite-deep valley during the consonants constriction.
Utterance <<blond-baby1>> illustrates some of these effects.  There is
a dip in the f0 around 1.9 s into the file for the [d] at the
beginning of "difference" and the sharp fall around 5.29 s right after
the [p] in "pink".  (To be sure, the perturbation caused by the [p] 
here is very small compared to many cases of voiceless obstruents that
we have seen.)

EXAMPLE <<blond-baby1>>: what's the difference among my long memory
                           H*      !H*     L-L%    L+H*     !H* H-H%
                          your blond baby    and the pink carpeting
                        L+H*     *? !H* L-H% L*      L*   H*   L-L%
[GIF}

In interpreting such `intrinsic' segmental effects, it is important to
note the actual voicing of the consonant, and not simply its phonemic
status.  For example, phonemically voiced obstruents in stressed
syllable initial position for many speakers are not always really
voiced.  Note, for example the /b/ of "blond" at about 3.95 to 4.0 s
in <<blond-baby1>>, which is voiceless unaspirated and has f0
perturbing characteristics more like those of the /p/ of pink.  Also,
the consonant /t/ in American and Australian English is usually a
voiced flap (a short [d]-like segment) when it begins an unstressed
syllable, as in the /t/ of "carpeting" in example utterance <<flap2>>.
Similarly, /h/ is often voiced between vowels.  Thus, the perturbation
caused by these two phonologically voiceless consonants is often like
that for a /d/ or a /v/, rather than like a true [t], as shown by the
/h/ in example utterance <<voiced-h>> (at around 3.04s).

EXAMPLE <<flap2>>:	The pink carpeting.
[GIF}
                            H*   H*    L-L% 
EXAMPLE <<voiced-h>>:	Give him a hand with that.
                                   H*         L-L%
[GIF}

Example utterance <<flap>> gives another environment where flapping is
common; see the flapped /t/ across the word boundary at around 1.34s.
The flapping here is also important for transcribing break indices
(see Section 3.2).

EXAMPLE <<flap>>:	Don't hit it to Joey.
                        H*              L*+!H L-L%
[GIF}

Another kind of problem in interpreting the f0 contour comes from
shifts into voice qualities other than normal modal phonation.  For
example, for most speakers, subglottal pressure falls very sharply at
the very end of an utterance.  If the cross-glottal pressure
difference becomes very weak, there may no longer be good glottal
closings -- i.e. the phonation may become quite breathy -- so that
even fairly robust pitch-tracking algorithms can easily fail.  For
some speakers this switch to breathy voice might happen even earlier
if the utterance has a long low-pitched stretch corresponding to a L-
phrase accent.  Or, a speaker might break into creaky voice in such a
region.  In fact, many speakers break into creaky voice in almost any
region with very low fundamental frequency.  Since creaky voice is
typically characterized by very irregular glottal periods (i.e., the
fundamental frequency is physically not well-defined), pitch-tracking
algorithms often do not do well during these portions of the
utterance, creating a messy `spattering' of values, like that seen in
the f0 trace between 4.95 and 5.08 seconds in <<jam2>>.  Here
the creak is due to the L*.

EXAMPLE <<jam2>>:	Will you have marmalade, or jam?
                                      L*     H-     L* H-H%
[GIF}

The pitch tracker can also completely fail, and give no f0 values at
all, as in the region of the L- in the second production of
<<made1>> after about 3.4s.

EXAMPLE <<made1>>:     Marianna made the marmalade.  
second production    2) L+H*                    L-L%
[GIF}

In these two examples (as in many other occurrences of the same tone
types in many of the example utterances in this labelling guide), the
creaky voice is reliably interpreted by native speakers as a very low
pitch value for some low tone.  However, creaky voice does not
automatically mean a very low L tone.  Creaky voice can also occur as
one common manifestation of a glottal stop, a segment which in English
often occurs phonetically as a way to set off a word beginning with a
stressed syllable that has no onset consonant.  For example, the word
"airline" in <<glottal-stop>> begins phonetically with a glottal stop
realized as creaky voice.

EXAMPLE <<glottal-stop>>: And set training and experience standards
			      H*   H*    H-      H*        H*    L-L%
			   for airline inspectors and mechanics.
	                       H*         H*   L-        H* L-L%
[GIF}

Nor are breathy voice and creaky voice the only source of
pitch-tracking errors.  Even in parts of the utterance with normal
modal voicing, pitch-tracking algorithms can sometimes go wrong
because of fluctuations of amplitude or because of the vowel's
resonance characteristics.  A perfectly ordinary period-to-period
oscillation in amplitude can cause a halving of the estimated
fundamental frequency value, as illustrated in the region between 4.8
and 4.93 seconds and again between 5.3 and 5.45 seconds in example
utterance <<pitch-halving>>.  (Compare this to <<no-pitch-halving>>,
which is exactly the same utterance, pitch-tracked with somewhat
different assumptions about the signal parameters which the
pitch-tracking program uses in its consistency-checking algorithm.)
Or, if the first formant is much higher than the fundamental, the
pitch tracking program might take the amplitude of harmonics that it
amplifies as an intervening glottal pulse, effectively doubling the
pitch, as in the region between 14.07 and 14.18 seconds in example
utterance <<pitch-doubling>>.  Transcribers must therefore learn when
to trust their ears to catch such misparsings in the fundamental
frequency track (or to use an alternate record of the fundamental
frequency contour, such as the narrow-band spectrogram).  When all of
these perturbing effects are taken into account, however, the
fundamental frequency contour becomes a valuable aid in transcribing
the events on the tone tier.

EXAMPLE <<pitch-halving>>:	Jim builds a big daisy-chain.
                                    H*           H*        L-L% 
[GIF}
EXAMPLE <<no-pitch-halving>>:	Jim builds a big daisy-chain.
                                    H*           H*        L-L% 
[GIF}
EXAMPLE <<pitch-doubling>>:	Then I don't know if I can explain
                                       H*                   L+H*
                                it to you.
                                        L-L%
[GIF}


2.2. Some familiar contours, and the contrast between H* and L+H*

The inventory of events that are transcribed on the tone tier are five
pitch accents, two phrase accents, and two boundary tones (plus
downstepped counterparts of pitch accents and phrase accents with H
tones).  The summary statement in Appendix A lists the symbols for all
of these tones and defines their use.  In the previous sections, we
already illustrated several familiar intonation patterns involving
these tones.  For example, the first productions of the sentences in
example utterances <<made1>> and <<insert>> were instances of the
`declarative contour' which is an intonation phrase containing one or
more H* pitch accents and ending in a sequence of L- phrase accent and
L% boundary tone -- i.e. (H*) H* L- L%.  (When there is more than one
accent, particularly when there is one relatively early and one
relatively late accent, this contour is often called the `hat
pattern'.)  The last production in utterance <<made1>> illustrated
a sequence of L- phrase accent followed by H% boundary tone that is
sometimes called the `continuation rise'.  The first production in
utterance <<made4>> was an example of the `yes-no question
contour', consisting of one or more L* accents followed by a H- phrase
accent and H% boundary tone -- i.e. (L*) L* H- H%.

The productions in <<made1>> and the first two productions in
<<made3>> also illustrate one of the more difficult contrasts in
pitch accent type -- that between the two types of `peak accent' in
which the peak is timed to occur on the accented syllable (H* versus
L+H*).  These two pitch accents are alike in that both have high
fundamental frequency targets timed to occur on the accented syllable.
They are alike also in that the actual timing of the f0 peak that
realizes the high tone can vary depending on the phonetic length of
the syllable and on the neighboring tones.  In longer syllables just
before a L- phrase accent, the peak tends to come fairly early in the
syllable, whereas in short syllables with no immediately following
tone target, the peak for the high tone can be quite late, sometimes
after the actual acoustic end of the syllable.  This is illustrated in
the hat pattern utterance in <<word1>>.  The peak for the high tone of
the first H* on "word" comes rather late (in the last third of the
syllable), whereas the peak for the high tone of the second H* comes
very early in "word" before the L- low tone target (during the first
quarter of the syllable).  How then do the two pitch accents differ?

EXAMPLE <<word1>>:	Your word is your word.
			     H*           H* L-L%
[GIF}

The essential difference is what happens before the high tone.  The
leading L tone in L+H* is meant to transcribe a rise from a
fundamental frequency value low in the pitch range that cannot be
attributed to a L* pitch accent on the preceding syllable or to a L-
phrase accent or L% boundary tone at a preceding intermediate-phrase
or intonation-phrase boundary.  For H*, by contrast, there is at most
a small rise from the middle of the speaker's voice range (unless, of
course, the H* follows soon after some low tone such as a L* pitch
accent or L- phrase accent).  Example utterance <<won>> is a
minimal pair illustrating this contrast.

EXAMPLE <<won>>:           Marianna won it.  
in two productions      1)    H*      L- L%
                        2)  L+H*      L- L%
[GIF}
                         
In the English intonation system as described by Pierrehumbert &
Hirschberg (1990), H* and L+H* have distinct meanings, which make the
latter more likely to occur in a contrastive context such as the one
evoked by the second production of the sentence in <<made1>>.  In
theory, this contrast between H* and L+H* can occur anywhere within a
phrase.  However, the distinction is difficult to make when the
accented syllable is the first in the utterance, as in the second
production of the sentence in <<anna>>.  These three productions are
examples of almost exactly the same patterns as exemplified by the
three productions in <<made1>>.  However, because the word "Anna"
has no unstressed syllables before the main stressed one, it is
difficult to realize the low tone for the nuclear accent on the first
word in the second production.  In cases such as this, where the
evidence for L+H* comes from (theory-dependent) intuitions about
meaning rather than from any clear low pitched region in the
fundamental frequency contour, the ToBI Annotation Conventions
prescribe H* instead.  (The *? on the "married" in the first
production illustrates a very common type of ambiguity about accent
placement that is discussed below in Section 2.9.)

EXAMPLE <<anna>>:	 Anna married Lenny.  
in three productions 1) H*     *?     H* L-L%
		     2) H*               L-L%
		     3) H* L-H%     L+H* L-L%
[GIF}

Even when there is a long enough stretch between the beginning of the
utterance and the accent, L+H* can be difficult to distinguish from H*
because the categorical distinction in meaning is not always matched
by a categorical distinction in the f0 level of the low tone.  (The
mapping of phonetic continua onto discrete oppositions is a well-known
problem in segmental phonology as well.)  Utterance <<made2>> above
illustrates this.  The L tone of the L+H* in the third production is
not so low as that in the second production.  When such utterances are
taken out of context, it is possible for even intonational experts to
be confused, and in fact, another transcriber with long experience in
transcribing English pitch accents questioned our transcription of
this as L+H*.  (We are confident in the transcription, and did not
mark it as X*? -- see Section 2.9 below -- but only because we know
the context.)
 
The last productions in <<made1>> and <<anna>> are very similar to
another type of contour where one needs to be especially careful in
choosing between H* and L+H*.  In both of these sentences, the nuclear
stress for the second intonation phrase occurs late enough that the
low-pitched region of the L+H* (nuclear) pitch accent could be
distinguished even if there were no H% boundary tone intervening
between the L+H* pitch accent and the L- phrase accent for the
preceding phrase.  In the very similar contours of example utterances
<<noone>> and <<for-marianna>>, on the other hand, there is no H%
boundary tone, and one must play close attention to the timing in
order to decide whether the accent in the second phrase should be
transcribed as H* or L+H*.  (Note that the first utterance in
<<for-marianna>> also probably illustrates grouping at the level of
the intermediate phrase and not a full intonation phrase; see Section
2.4 for the difficulty of telling these levels apart in this context).

EXAMPLE <<noone>>:        But Marianna knows noone.
		       	       L+H* L-L%   L+H* L-L%
[GIF}

EXAMPLE <<for-marianna>>: 1) That one's for Marianna.
                              H*     L-       L+H* L-L%

			  2) Give me the brown one for Marianna.
			     H*           H*  L-L%        H* L-L%
[GIF}

The first response alternative in example utterance <<mother4>>
illustrates another idiomatic intonation contour which might be
confused with L+H*.  This is the `surprise-redundancy' contour
described by Sag & Liberman (1975).  Here the preceding low pitched
region comes from a L* pitch accent on a prenuclear accented word.
The second response alternative shows the subtle way in which this
rising sequence differs from L+H*.  The simple interpolation from the
L* to the H* is more gradual than the steep rise within the L+H*
accent, although the difference can be very subtle when there are only
a few syllables between the two accents in the L* H* sequence, as it
is here.

EXAMPLE <<mother4>>:
	Who's it for?     Mary's mother.  It's for Mary's mother.
         L*      H* L-L%  L*     H* L-L%   	    *?   L+H* L-L%
[GIF}


******************************************************************** 
PRACTICE ONE -- H* versus L+H*, L* H* L- L%, L* H- H%, and other 
accents in familiar contours
********************************************************************

Transcribe these exercises using the exercises script.
_______________________________________________________________________

EASY:
EXERCISE <<amelia-p2>>:	Amelia. (two productions) 
EXERCISE <<mother1>>:   Marianna's mother.  
EXERCISE <<mole1>>:	A new mole.
EXERCISE <<anna1>>: 	Anna married Lenny.
     [Compare to last production in <<anna>>] 
EXERCISE <<thought>>: 	That's what I thought.  
EXERCISE <<lazy>>: 	He's lazy and crazy and stupid.  
_______________________________________________________________________
INTERMEDIATE: 
EXERCISE <<memphis1>>:	Are you going to visit your mother when
			you're in Nashville?  
EXERCISE <<memphis2>>:	My mother lives	in Memphis.  
EXERCISE <<heavy-rain>>: Heavy rain possible. High around 70.
     [Transcribe only the second sentence for now, concentrating on
      the "seventy".]  
EXERCISE <<wellies1>>:  Are you gonna wear your wellingtons?  
     [Concentrate on the nuclear accent and following tones.  (We 
      know that there must be a prenuclear accent of the same type
      as the nuclear one, but we're not sure where it is.)]
EXERCISE <<eileen-leaving>>: Eileen is leaving. (two productions)
EXERCISE <<tree1house>>: My classmate who lives in a treehouse was
                         written up in Atlantic.
_______________________________________________________________________
DIFFICULT: 
EXERCISE <<dream>>: 	So, what did you dream?
     [Don't worry about the tune on "So" for now.]  
EXERCISE <<thermometer>>: Keep the thermometer under your tongue.
     [Transcribe only the "under your tongue" for now.]  
EXERCISE <<fail1>>: 	So, a lotta times they fail.
     [Concentrate on transcribing the pitch accents here, and don't
      worry about transcribing the "So".]  
EXERCISE <<happens>>: 	And what happens is, when you...
     [Concentrate only on the first clause (before "when").]  
EXERCISE <<I-mean>>:	You know what I mean?  
EXERCISE <<anyway>>:	But anyway, if you can't see that then I don't
			know if I can explain it to you.
     [Note that the f0 tracker has doubled the pitch on the word "can",
      and that your transcription of the tones nearby should take this
      into account.]


2.3. The timing of the phrase accent and "upstep"

The examples so far also illustrate two important points about the
phrase accent.  First the phrase accent is unlike the boundary tone in
that it is not necessarily localized at the phrase edge.  Rather, when
the nuclear accent is far from the end of the intermediate phrase, the
phrase accent fills in the space in between it and the phrase edge,
creating a long flat valley for L- realized over a long stretch, as in
the first speaker's production in example utterance <<names>> (the
region between 11.06 and 11.64 seconds), or a long plateau-like region
for H- realized over a long stretch, as in the second speaker's
production in example utterance <<names>> (the region between 13.3 and
13.9).  Second, the H- phrase accent triggers an "upstep" (a local
raising of the pitch range to the end of the phrase), so that a
following H% boundary tone is realized as a second rise at the end of
the plateau-like region.  This second point can also be seen clearly
in example utterance <<names>> starting at 13.9 seconds.  Compare the
much lower f0 target for the H% boundary tone in example utterance
<<names>> at 11.78, where the H% occurs after a L- phrase accent and
therefore is not realized in such an upstepped pitch range.

EXAMPLE <<names>>:
	Anna may know my name, and yours too.     Anna may know our names?
         H*                 L-H%     H*   H* L-L%  L*                 H-H%
[GIF}


(Some experienced transcribers may want to call the pitch accent on
"Anna" in the first sentence a L+H*.  We remind them of the annotation
guidelines that say in effect "Whenever there might be any doubt
whatsoever, such as on the absolute utterance initial syllable, choose
H* rather than L+H*.")

The summary statement on ToBI conventions prescribes that, in a
waves(tm) label file, the phrase accent (or phrase accent and
following boundary tone) should be marked at a point at or just before
the end of the last segment in the word ending the intermediate phrase
(or full intonational phrase) and always before the related break-index
mark.  The conventions say that the phrase accent should be placed here 
even when the nuclear accent occurs quite early and the phrase tone is 
realized over a long period of time, as in these two example utterances.  

Note that when the nuclear accent is close to the end of the
intonation phrase, it is impossible to discern any inflection point
between the high f0 target for the H- phrase accent and the even
higher f0 target for the upstepped H% boundary tone.  The upstepped
boundary tone after "jam" in example utterance <<jam1>>
illustrated the smooth single rise that results in this case.

Example utterance <<money>> illustrates the full paradigm of
combinations of phrase accent and following boundary tone that can
occur at the end of an intonation phrase.  Note that because of the
upstep of the pitch range after the H- phrase accent, the L% boundary
tone of a H- L% sequence does not have an absolutely low f0 target,
just a lower one than that of the upstepped H% boundary tone.  The
contrast between a H- L% and a H- H% sequence is particularly salient
when the preceding nuclear pitch accent is H*, as in the two sentences
in example utterance <<name1>>.

EXAMPLE <<money>>:  1) Is that Marianna's money?
                           H*     H*         L-H%
                    2) That's Marianna's money.
                                 H*          L-L%
                    3) That's Marianna's money.
                                 H*          H-L%
                    4) Is that Marianna's money?
                           L*     L*         H-H%
[GIF}
[GIF}

EXAMPLE <<name1>>:  My name is Marianna.  
in two productions  1)            H* H-H%
                    2)	          H* H-L%
[GIF}

The productions in <<money>> of the two combinations with final L%
boundary tone illustrate another potential difficulty, and highlight
the importance of listening to the speech and not just looking at the
f0 record when doing the intonational analysis for the tone tier
transcription.  When an intonation phrase is not the last one in an
uninterrupted stretch of speech and it ends with a L% boundary tone,
it is difficult to distinguish from an intermediate phrase ending with
the corresponding phrase accent just by examining the f0 contour.
That is, the pitch differences between L- L% sequence and a mere L-,
or between a H- L% sequence and a mere H-, are very subtle at best.
Here the transcriber must rely on the subjective sense of degree of
disjuncture, which is probably cued by such other things as the amount
of preboundary lengthening or the degree of final lowering in the case
of L- L% versus L-.  (Note that the difference here must be
transcribed also on the break index tier -- Section 3.)  Example
utterances <<park2>> and <<oregano>> illustrate this difficulty.  In
<<park2>>, looking just at the f0 contour, we might have L- L% and a
full intonation phrase boundary or just L- and a mere intermediate
phrase boundary between the nuclear H* pitch accents on "probably" and
"pleasantest".  The ambiguous durational cues (two experienced
transcribers argued even over whether the boundary should be before or
after the "the") supports the notion that this must be the latter, a
mere intermediate phrase boundary.  By contrast, the strong sense of
pause (caused by the lengthening on "and"?) in the tonally identical
stretch between the accents on "shortest" and "probably" support a
full intonation phrase.  In <<oregano>> the two productions contrast
the L* H- intermediate phrases typical of a list in the first
production with the ambiguous H* H- or H* H- L% plateaus of the
second.  

EXAMPLE <<park2>>:	Definitely the shortest and probably the pleasantest
                        H*           L-  H*    L-L%  H*         L- H*
			way to go is through the park.
                                 L-             L+H* L-L%
[GIF}

EXAMPLE <<oregano>>:	1) Let's see   I need oregano 'n marjoram 'n some
                           H*   H* L-L%        L*  H-    L*    H-
			   fresh basil okay?
                          L+H*  !H* L-  H* H-H%
			2) Oh  I don't know    it's got oregano 'n marjoram
                           H* !H*      !H* L-L%          H*  H-    H*  H-
			   'n some fresh basil.
                                         H* H-L%
[GIF}

The f0 patterns on "oregano" and "marjoram" illustrate also another 
difference between the H- and L- phrase accent, particularly in the
contexts of unlike tones on the preceding nuclear accent -- i.e. in
the context of L* H- versus, say, H* L-.  When the nuclear accent
is on an early syllable in the last word in the phrase, the L- of a
H* L- sequence seems to kick in very immediately with a sharp fall 
that typically begins during or just after the accented syllable.  In 
the analogous situation, the rise from a L* to a H- begins as early, 
but the f0 change is much more gradual.  Here, for example, the f0
seems to be rising continuously from about a third of the way into the
accented syllable all the way to the end of the phrase.  In this case,
there is no real inflection point leveling out into a plateau.

The last clause of the first production in <<oregano>> also
illustrates anew the difficulty mentioned above in connection with
utterance <<mother4>> in Section 2.2.  What is the best analysis of
the fall to a low level immediately after "marjoram" and subsequent
rise to a high f0 on "fresh"?  How can we distinguish, say, a sequence
of L* H* from the L+H* that we have transcribed?  One thing to note is
that, since accented syllables must be stressed, other characteristics
of a syllable must be compatible with a tonal analysis that puts a
pitch accent on it.  The words "and" ("'n") and "some" here do not
sound stressed at all.  Both have been reduced to the point that they
have syllabic nasals as their nuclei.  This supports the analysis of
L+H* on "fresh" over an analysis of L* H*, even though the fall from
"marjoram" looks so much steeper than the gradual rise from "and" back
up to the H tone on "fresh" that the f0 pattern may seem more
compatible with a L* on "and".  Note, however, that there may be
mistracking due to breathy voice on "and".  Also, the "some" shows a
strong perturbation from the initial voiceless [s] that obscures how
low the intended f0 is later in the syllable.

2.4. Difficult combinations of nuclear pitch accent and following
phrase accent

In most of the examples so far, nuclear H* has occurred before L-,
where the following fall in pitch makes it easy to discern the pitch
accent, and nuclear L* has occurred only before H-, where it was easy
to spot from the immediately following rise in pitch.  But the choice
of pitch accent type is independent of the choice of following phrase
accent, and there is nothing to preclude H* from occurring before H-
or L* before L-.  The second production in <<oregano>> illustrated the
first case of this `stylized high-rise' contour (Ladd, 1980), which is
becoming more and more familiar to contemporary American English
speakers.

The combination of L* and following L- is also not rare.  There are
two situations where this sequence is typically encountered.  The
first is illustrated in <<nose>>, and the first sentence in <<tags>>.
This L* L- H% pattern is typical of such vocative tags.  The second
sentence in <<tags>> shows that tag questions can have this contour
too.  However, tag questions can also take a H* L- L% intonation
pattern (the third sentence in <<tags>>), which seems to be precluded
on the vocative tag for pragmatic reasons (see Beckman &
Pierrehumbert, 1986).

EXAMPLE <<nose>>: Oh   don't nuzzle me you marmalade-nose.
                 X*? L- H*   !H*   L-      L*        L-H% 
[GIF}

(Section 2.8 will explain the `!' diacritic in the second pitch
accent, and Section 2.9 will explain the X*? accent on "Oh".)

EXAMPLE <<tags>>:	1) Where are you going, Willy?
                                          H* L- L* L-H%
			2) He won't be going, will he?
                               H*       H* L- L*  L-H%
			3) He won't be going, will he?
                               H*       H* L- H*  L-L%
[GIF}
[GIF}
[GIF}

The f0 contour for the L* L- H% vocative tag contour can be confused
with a longish postnuclear stretch in the sequence H* L- L%, as shown
in example utterance <<vocative1>>.  As with medial L- intermediate
phrase boundary versus L- L% intonation phrase boundary discussed
above, the transcriber may have to rely entirely on the subjective
impression of greater versus lesser disjuncture to capture this
difference between an intermediate phrase boundary at a vocative tag
and no boundary.  (Note again that the difference here must be
accompanied by different symbols on the break index tier.)

EXAMPLE <<vocative1>>:	1) Anna will win, Manny.
                                     H* L- L* L-H%
                        2) Anna will win Manny. (She won't lose him).
                           H*        H*    L-L%
[GIF}

The other situation in which one often sees a L* nuclear accent and
following L- phrase accent is in the `contradiction contour', an
intonational idiom illustrated in <<gloria>> and <<elephant3>>.  This
contour is discussed at length in Sag & Liberman (1975) and chapter 3
of Ladd (1980).  The L* L- H% sequence starting at the nuclear
syllable is like the contour in the vocative tag, but this is not the
only essential component of this intonational idiom.  Crucially, there
must be a fall from an early prenuclear H* pitch accent (or from an
initial %H boundary tone -- see next section) onto a nearby L* accent.
If the L* nuclear accented syllable is far from the beginning of the
utterance (as is the case with the nuclear accent on "incurable" in
<<elephant3>>), there might be another L* on some prenuclear syllable
with relatively prominent secondary stress (e.g., the fourth syllable
of "elephantiasis" in <<elephant3>>).  Note that <<elephant3>> also
illustrates the possibility of having two pitch accents on one word
when there is more than one full stressed syllable (see Section 2.9
for more examples).

EXAMPLE <<gloria>>:	Ah Gloria  you're not ugly.
		        H*  L* L-L% H*        L* L-H%
[GIF}

EXAMPLE <<elephant3>>:	Elephantiasis isn't incurable.
                       H*      L*             L* L-H%
[GIF}


2.5. The initial %H boundary tone

The contradiction contour also illustrates another phenomenon that we
have not discussed so far -- namely, the possibility of a boundary
tone marking the initial as well as the final boundary of an
intonation phrase.  Utterance <<bananas>> is an example.  Here the
event that provides the high pitch for the early fall onto the L* tone
cannot be an accent, since the first syllable of "bananas" is reduced
(i.e. completely unstressed and hence unaccentable).

EXAMPLE <<bananas>>: Bananas aren't poisonous.
                    %H  L*           L*    L-H%
[GIF}

In the intonational analysis assumed in the ToBI system, the final
boundary tone is mandatory, whereas an initial one is not.  The
initial boundary tone differs from the final ones also in that it
seems to be limited to absolute utterance-initial position, and in
that it is always high. Thus, unlike the final boundary, where there
is a paradigmatic choice between L% and H%, the phrase-initial
boundary tone contrasts merely with the absence of a boundary tone.
That is, %H contrasts with the default (unmarked) initial pattern,
which in absolute utterance-initial position tends to start in the
middle part of the speaker's pitch range (as opposed to beginning of
utterance-medial intonation phrases, where the pitch simply continues
from the value at which the previous phrase ended).  This
utterance-initial midrange pitch value is illustrated in <<loan1>>,
where the first and second productions show a rise from the mid value
to H* and a fall from the same default mid value to L*, respectively.
The third production then contrasts with the second in that it has an
initial %H boundary tone.  These two examples also illustrate the
typical effect of having an initial %H boundary tone in the
surprise-redundancy contour.  The one with the initial %H has a
greater vividness, conveying either more surprise or more insistence
that this is the information that the hearer should really already
know.

EXAMPLE <<loan1>>:      You need a loan.  
In three productions  1)     H*     H* L-L%
                      2)     L*     H* L-L%
                      3)  %H L*     H* L-L%
[GIF}

The ToBI conventions prescribe that %H be an analysis of last resort.
That is, like L+H*, which is used instead of H* only when there is no
other possible explanation for the low pitch before the peak (see
Section 2.2, above), %H is used only when no other plausible
explanation for an initial high pitch.  It should be marked only when
a high-pitched beginning for an utterance cannot be attributed to a H*
accent on the first few syllables in the utterance -- i.e., when the
first word itself does not appear to be accented or when its accented
syllable occurs too far into the word to account for the initial high
target.  Thus it should not be used in <<gloria>>, where the high
pitch at the beginning of the phrase "You're not ugly" is attributed
to a H* accent on the first syllable "You're".  In <<elephant3>>,
similarly, although the main stress of the word "elephantiasis"
clearly is at the L* accent on the fourth syllable, we have the option
of analyzing the earlier high pitch as another pitch accent earlier in
the word, since the first syllable has a lexical `secondary stress'
(i.e. is rhythmically more prominent than the surrounding syllables).


*********************************************************** 
PRACTICE TWO -- phrase accent and boundary tone contrasts
***********************************************************

The following examples are practice utterances for the phrase accent
and boundary tone contrasts discussed in the last few sections.
Transcribe these exercises using the exercises script.

_______________________________________________________________________
EASY:
EXERCISE <<manitowoc>>:	   Does Manitowoc have a bowling alley?  
EXERCISE <<mother2>>:      For Marianna's mother.  
EXERCISE <<cream>>: 	   Would you like some cream?  
EXERCISE <<wellies2>>:     No, I think I'll wear my hiking boots.
     [Don't worry about transcribing the tune on "No" for now.]  
EXERCISE <<voice>>:        You lost your voice.  
EXERCISE <<flour2>>:	   Oh nothing special, you know flour and
                           butter and sugar.
     [Transcribe just the second part, after the "you know".] 
EXERCISE <<audience1>>:	   Good evening radio audience.
     [Don't worry about the transcription of "radio".]
EXERCISE <<good2>>:        I thought it was good.  
EXERCISE <<legumes1>>:     Legumes are a good source of vitamins, and of
			   protein as well.  
EXERCISE <<legumes2>>:     Legumes are a good source of vitamins, but not
                           the best.
     [Transcribe only the first part, up through "vitamins".] 
EXERCISE <<legumes3>>:     Legumes are a good source of vitamins, and so
                           are greens.
     [Transcribe only the first part, up through "vitamins".] 
EXERCISE <<stalin>>:       I was wrong, and Stalin was right.  I was wrong.
_______________________________________________________________________
INTERMEDIATE: 
EXERCISE <<friend1>>:      A friend of mine um works for NASA.
EXERCISE <<good1>>:	   I thought it was good?
     [Play <<good2>> for contrast.] 
EXERCISE <<pigs>>:         They've eaten the pigs.  (two productions) 
EXERCISE <<flour1>>:	   I need flour and sugar and butter and oh I
                           don't know.
     [Transcribe only the part up through "butter" for now.]  
EXERCISE <<atlanta>>:	   Yes I would uh like the information on the
                           flight leaving from uh Philadelphia to Atlanta.
     [Concentrate just on the parts "like the information" and "Philadelphia
      to Atlanta".]  
EXERCISE <<good-aft>>:     Good afternoon.  Information Services.
EXERCISE <<knock-stuff>>:  Mostly they just sat around and knocked stuff.
			   You know, the school, other people.
     [Concentrate for now on the second sentence, starting at "You know..."]
EXERCISE <<drive>>:	   I'm not going to drive to school today.  
EXERCISE <<spoon1>>:       There's a spoon in here.
_______________________________________________________________________
DIFFICULT: 
EXERCISE <<mother3>>:       I've told you a million times!  It's for
			    Mary's mother.
     [Compare <<mother4>>; don't agonize too much over the tones
      around "for" in the second sentence.]
EXERCISE <<experience1>>:   Well I mean, would you hire somebody that
                            doesn't have no experience?  
EXERCISE <<trafficlight>>:  That's right at the traffic light.
                            (two productions)


2.6. Pitch accent timing, and the L*+H pitch accent

The examples so far have illustrated both possible phrase accents,
both boundary tones, and three of the five types of pitch accent --
the low accent (L*), the plain `peak accent' (H*), and the `rising
peak accent' (L+H*).  We have also discussed the timing of the f0 peak
in the two types of peak accent, pointing out that it is somewhat
variable; in particular, that it occurs somewhat later relative to the
segments of the accented syllable when it is the accent at the
beginning of a `hat pattern' contour and relatively earlier before L-
(see Section 2.2).  Such differences in timing are not distinctive,
and seem to be related to the phonetics of pre-boundary lengthening.
For example, we might think of the relatively earlier placement of the
peak in the latter case as a matter of lengthening the part of the
syllable after the nuclear pitch accent peak in order to accommodate
the L- phrase accent within the intermediate phrase (see, e.g.,
Silverman & Pierrehumbert, 1989, for a discussion of such phonetic
accounts).  These phonetic differences in timing can be ignored in the
transcription on the tone tier.

There is another difference in the timing of apparent peak accents,
however, that must not be ignored, because it is distinctive.  Both
the small rise from mid pitch that is usually seen with an
utterance-initial H* accent and the definitive rise from low pitch
that is necessarily seen to transcribe a L+H* accent contrast
phonologically with another accent type that involves a rise from low
pitch into a peak that occurs much later, making the low tone align
with the accented syllable.  This is the `scooped' accent L*+H,
illustrated in the first production of <<millionaire>>.  The second
production in this example utterance is of the contrasting `rising
peak' accent L+H*.  These two pitch accents have very different
meanings, as described by Ladd (1980) and Ward & Hirschberg (1985),
and the difference in timing here is a phonological difference that is
represented in the ToBI system by the contrasting specifications of
L*+H versus L+H*.  That is, phonologically, both of these accents are
a L plus a H, but in the `scooped' accent, the L is the starred tone
(associated to the accented syllable) rather than the H.  The
associated phonetic difference is that the rise is much later in the
`scooped' accent, and it is the timing of the minimum f0 relative to
the segments of the associated syllable that is salient.

EXAMPLE <<millionaire>>:  Only a millionaire.  
in two productions 	1) H*     L*+H   L-H%
                        2) H*   L+H*     L-H%
[GIF}

Because it is the L target in the `scooped' accent that is associated
to the stressed syllable, and not the H, the high pitch target is
specified only as occurring somewhat later than the L, and the timing
of the peak f0 relative to the segments is not controlled.  If the
stressed syllable is long, the rise to the peak might be accomplished
entirely within the accented syllable.  But if the stressed syllable
is short, the peak may occur one or more syllables later.  This is
illustrated in example utterance <<stein>>, which shows a relatively
fixed rise relative to the low f0, which makes the peak occur within
the last part of the long syllable "Stein" but two syllables later
relative to the short accented first syllable in "rigamarole".

EXAMPLE <<stein>>: Stein's not a bad man.
                   L*+H             L-H%
                   Rigamarole is monomorphemic.
                   L*+H                   L-H%
[GIF}

Note that although the crucial difference between L+H* and L*+H is the
timing of the low pitched portion, some speakers produce a secondary
difference, whereby the L of L*+H is consistently somewhat lower in
the pitch than the L of L+H*.  This is particularly apparent in the
first accents of the two productions in <<bloomingdales>>.  This also
means that L*+H is not nearly so confusable with H* as is L+H*.  There
is also considerable interspeaker differences.  Some speakers have
rather mid-level L tones even in L*+H.  (This fact will be relevant
when you transcribe <<noodle1>> and <<noodle2>> in PRACTICE THREE.)
Other speakers have very low L tones even in L+H*.  This does not
affect the relative heights of the L's in L*+H versus L+H*.

EXAMPLE <<bloomingdales>>: There's a lovely one in Bloomingdale's.  
in two productions: 	   1)         L*+H           L*+!H    L-H%
                           2)       L+H*          L+!H*       L-L%
[GIF}

Another thing to note in the sequence of accents in these two
productions is the introduction of another set of symbols for the
second (the nuclear) accents.  These new symbols are actually the same
accent types as the first accent in their respective utterances.  The
extra `!' in the symbols for the nuclear accents is a diacritic to
denote the way in which the second accent peak is lower than the
preceding peak.  This lowering of the second peak is due to a process
called `downstep', which is defined as a categorical compression of
the pitch range that reduces the f0 targets for any H tones subsequent
to the specification of the downstep -- i.e. the counterpart of the
`upstep' triggered by the H-.  We will describe downstep and what
triggers it in more detail in Section 2.8 below after introducing the
last remaining pitch accent type in the ToBI analysis, transcribed as
H+!H*.

Finally, as in deciding how low the f0 must be to count as L+H* rather
than H*, transcribers should be aware of slight interspeaker
differences in the timing of the L tone in differentiating L+H* from
L*+H.  Our impression is that American speakers (such as the speaker
of <<bloomingdales>> do not always make L*+H rise as late as most RP
British speakers do.  The second (downstepped) L*+!H on the word
"Bloomingdale's" in the first production, in particular, might seem
quite early to a British transcriber.  Note, however, that there is a
very low pitch level throughout the [b] and the [l], and the f0 does
not begin to rise until the voicing begins in the [u], making the peak
occur considerably after the [m] release.  This is quite late for a
nuclear L+H* before a L- (cf. our comments above in Section 2.2.), as
can be seen by comparing this rise to the rise in the comparably
downstepped nuclear L+H* in the second production.  In the second
production, the rise begins before the [b] and is completed well
before the release of the [m].


2.7. The H+!H* pitch accent

The nuclear accent in the second production in example utterance
<<theresa>> illustrates this pitch accent type.  It is characterized
by a fall from a preceding higher pitch onto a lower pitch level on
the accented syllable.  This accent type corresponds to the type
called H+L* in Pierrehumbert's original system.  The substitution of
the letters `!H' for `L' in the name of the pitch accent reflects the
fact that the pitch target on the accented syllable is only somewhat
lower than the preceding H tone target; it is not so low as the f0
target for the plain `low' accent (L*) or for the L tone of the
`scooped' accent (L*+H), or even for the L of the `rising peak' accent
(L+H*).  The renaming of this pitch accent type was intended to make
the analysis somewhat more concrete and intuitive for the transcriber.

EXAMPLE <<theresa>>: You want an example? How about Mother Theresa?
                          H*       H* H-H% H*       *?        H* L-L%
                               You want an example? Mother Theresa.
                                    H*       H* H-H%       H+!H* L-L%
[GIF}
 
(The *? in the first production indicates uncertainty about whether
that word is accented -- see Section 2.9.)
 

2.8. Downstep

Downstep is a phonologically triggered compression of the pitch range
that lowers the f0 targets for any H tones subsequent to a downstep
trigger.  In Pierrehumbert's model of intonation, downstep is said to
be triggered by any bitonal pitch accent.  In example utterance
<<bloomingdales>> discussed above, for example, the progressive
reduction of the second L+H* or L*+H peak relative to the preceding
one would be analyzed as an automatic consequence of the fact that
these two pitch accent types are composed of two tones, L plus H.

In the ToBI system, this compression of the pitch range is marked by
having alternative names for accents which are used for the first
downstepped high tone target after the downstep trigger.  Thus in the
first production in example utterance <<bloomingdales>>, the second
`scooped' accent is transcribed with L*+!H rather than L*+H to denote
that a downstep has occurred.  And similarly in the second production
in this example utterance, the second `rising peak' accent is
transcribed with L+!H* rather than L+H*.  When there are more than two
such bitonal accents in a row, each accent triggers another instance
of downstep, so that each subsequent accent peak is reduced yet again
relative to the immediately preceding one.  This is illustrated in
example utterance <<yellow2>>.  Example utterance <<calling>> shows
that it is not just pitch accents which are affected by downstep.  The
!H- phrase accent here is reduced to a mid level by the downstep
triggered by the preceding L+H* nuclear pitch accent.  (Note the
characteristic mid-tone tail, as the downstepped !H- phrase accent
triggers a subsequent upstep of the L% boundary tone.)

EXAMPLE <<yellow2>>: There's a lovely yellowish old one.
                      H*      L+H*  L+!H*    L+!H*  L-L% 
[GIF}

EXAMPLE <<calling>>: 	Marianna.
			 L+H* !H-L%
[GIF}

Transcribers who are familiar with Pierrehumbert's system will
recognize that ToBI differs in this explicit marking of the reduced
pitch range directly on the first H tone affected by the downstep.  In
Pierrehumbert's system, downstep is not explicitly marked because it
is redundant to the specification of the trigger in the preceding
bitonal accent.

Pierrehumbert's system differs from ToBI in yet another way; it
includes a sixth pitch accent type, H*+L, which bears the same
relationship to H+L* (ToBI's H+!H*) as L*+H does to L+H*.  That is,
the fall to a slightly lower pitch target occurs after the accented
syllable instead of into the accented syllable.  Typically, the
endpoint of this fall is no lower than the pitch target of a
subsequent downstepped H tone, and the contrast between H* and H*+L
thus hinges on recognizing the downstep triggered by the H*+L.

Many first time transcribers find this comparatively abstract analysis
unintuitive and therefore difficult.  In the ToBI system, therefore,
we have eliminated H*+L in favor of marking the downstep directly on
the first reduced H tone.  Thus H* in ToBI corresponds to both plain
H* and the downstep triggering H*+L.  Users of databases transcribed
with the ToBI system who need to analyze the data in terms of the
intonational categories in Pierrehumbert's system, can recover each
H*+L tone by searching for a downstepped !H* or !H- marked immediately
after a H* (or !H*) accent.  For example, in utterance <<really1>> the
second production is a plain `hat pattern' (H* H* L- L%) whereas the
first is a `downstepped hat', which would be transcribed as H*+L H* L-
L% in Pierrehumbert's system.  The second production in utterance
<<calling2>> illustrates another very familiar intonation pattern, the
`calling contour', which in Pierrehumbert's system would be
transcribed with H*+L H- L%.

EXAMPLE <<really1>>:      That's really illuminating.  
in three productions 	1)        H*     !H*    L-L%
                        2)        H*      H*    L-L%
                        3) Transcribe this one in PRACTICE THREE
[GIF}

EXAMPLE <<calling2>>:	    Anna.  
in two productions 	1) L*  H-H%
                        2) H* !H-L%
[GIF}

A fact to note about downstep is that it is local to an intermediate
phrase.  Each new intermediate phrase represents a new paradigmatic
choice of pitch range, at which downstep can be reset.  This is
illustrated in <<yellow3>>, where the intermediate phrase boundary
after the "yellowish" allows a new choice of pitch range, so that the
peak on "old" is not downstepped relative to that on "yellowish",
unlike in <<yellow2>>.  (See Section 2.9 to read about the X*?
symbol marking the peak on "old".  It indicates that there is an
accent on "old" but we are not completely certain which type of accent
it is.  It could be simply H* after an unexpectedly steep rise from
the preceding L-.  Or it could be L+H*, with a less steep rise than
expected.)

EXAMPLE <<yellow3>>: It's lovely and yellowish, and it's an old one.
                         L+H*      L+!H*    L-              X*?  L-L%
[GIF}

Note, however, that the peak on "yellowish" looks downstepped relative
to that on "lovely".  This is due to the relationship between the
pitch ranges chosen for the two intermediate phrases.  The topic
structure of a discourse is marked in part by the choice of pitch
range for the succession of intermediate phrases; large topics begin
with expanded pitch range and end with very reduced pitch range (see,
e.g., Brown, Curie, & Kenworthy, 1980; Hirschberg & Pierrehumbert,
1986).  This often creates an effect of `paragraph intonation' in
which the relationship among successive phrasal pitch ranges mimics
the phrase-internal relationship between preceding peaks and
subsequent following lower downstepped peaks.  The successive pitch
ranges in utterances <<park1>> through <<park5>> in PRACTICE TWO above
illustrate `paragraph intonation' over a longer time frame.

Sometimes it is not easy to tell the difference between two
phrase-internal accents with the second downstepped relative to the
first and two intermediate phrases with the second phrase in a lower
pitch range relative to the first.  Example utterance <<levels>>
illustrates such a difficult case.

EXAMPLE <<levels>>: There are many intermediate levels.
                            L+H*      L+!H*   L+!H* L-L%
[GIF}

******************************************* 
PRACTICE THREE -- L*+H, H+!H*, and downstep 
******************************************* 
Transcribe these exercises using the exercises script.

_______________________________________________________________________
EASY:
EXERCISE <<yellow1>>:   There's a lovely yellowish old one.
     [Compare to <<yellow3>>.] 
EXERCISE <<really1>>:	That's really illuminating.
     [Transcribe third production now; first two are examples from 2.8.]
EXERCISE <<eileen1>>:	Eileen's pro-English.  
EXERCISE <<eileen2>>:	Eileen's pro-English.
     [Compare to <<eileen1>>.]  
EXERCISE <<calling3>>:	Marianna. (two productions)
EXERCISE <<windy>>: 	Becoming windy.  
EXERCISE <<park1>>:	Okay to get from home to the station.
EXERCISE <<park4>>:	But uh in fact I have to go along the main road for
			a little ways it's probably about three hundred yards.
EXERCISE <<legumes2>>: 	Legumes are a good source of vitamins, but not
                        the best.
     [Repeated exercise from PRACTICE TWO. Now transcribe the
      second part, after "vitamins".]
_______________________________________________________________________
INTERMEDIATE: 
EXERCISE <<friend2>>: A friend of mine works for NASA.
     [Compare to <<friend1>>.] 
EXERCISE <<mile>>:	You give him an inch, he takes a mile.  
EXERCISE <<really2>>:	That's really illuminating. 
                        (three productions) 
EXERCISE <<park3>>:	It would be nice to be able to go right out
                        the back door and into the park cause it's
	                actually right behind the house.
EXERCISE <<flour1>>:	I need flour and sugar and butter and oh I don't know.
     [Repeated exercise from PRACTICE TWO. Transcribe only the part
      after "butter".]  
EXERCISE <<sununu>>: 	... and denies speculation that Chief of Staff,
                        John Sununu, is meddling in the region's
                        environmental affairs.  
EXERCISE <<noodle1>>:	We have a lean mini-noodle with beans.
			Well, we have a lean mini-noodle dish.  
EXERCISE <<noodle2>>:	We have a lean mini-noodle with beans.
			We have a lean mini-noodle dish.  
EXERCISE <<romanelli>>: John Romanelli, John Romanelli, please return to
                        the ticket counter.  
EXERCISE <<thatone>>: 	Do you really think it's that one?
                        (two productions) 
EXERCISE <<word>>: 	Your word is your word.
     [Compare to <<word1>>.]
_______________________________________________________________________
DIFFICULT: 
EXERCISE <<thatone2>>: 	Do you really think it's that one?
			(two more productions)
     [Don't agonize too much over the tones around "Do you" in 
      the second production.]  
EXERCISE <<heavy-rain>>: Heavy rain possible. High around 70.
     [Repeated exercise from PRACTICE ONE. Transcribe the first 
      sentence now, concentrating on the "rain".]  
EXERCISE <<tree2house>>: My classmate who lives in a treehouse was
                         written up in Atlantic.
     [Compare <<tree1house>> in PRACTICE ONE.] 
EXERCISE <<knock-stuff>>: Mostly they just sat around and knocked stuff.
			  You know, the school, other people.
     [Repeated exercise from PRACTICE TWO. Concentrate now on the first
      sentence, the part before "You know..."]
EXERCISE <<argument>>:	If he can then there's no argument about it.
	        	(two productions) 
EXERCISE <<sublime1>>:	Sublime mnemonic rhyme and free meter.  
EXERCISE <<sublime2>>:	Sublime mnemonic rhyme and free meter.  
EXERCISE <<fail>>: 	And what happens is: when you...  when you buy my
                        business, and you try to run my business, it's
                        really hard for you to run my business.  So a
                        lotta times they fail.
     [Concentrate particularly on the "When you buy my business", and 
      don't worry about the preceding interrupted "when you..."]  
EXERCISE <<business>>: 	A lot of people have done this; they sell their
                      	business, and they have... If something goes wrong,
                      	and they have the first rights to buy it back.
                      	[Interviewer: Oh, really?]
     [You've already transcribed parts of this in earlier practice sets.
      Here concentrate on filling in the missing pieces up through
      "something goes wrong", leaving for now the "and they have..."]
 
 
2.9. Uncertainty about accent placement and accent type

In addition to conveying topic structure, pitch range variation is
used for many discourse purposes.  For example, a lower (or higher)
pitch range than surrounding phrases can be used to set off a stretch
of speech as an aside or a parenthetical.  This is illustrated in
example utterance <<capote>>.  Also, expanded pitch range can convey
extra liveliness or involvement, as illustrated in the much larger
pitch range on the phrase "Now be careful" in <<onions>>.

EXAMPLE <<capote>>: Capote died Saturday at the Bellaire home of
                    L+H*       !H*    L-        H* H*   L+H*  L-
                     Joanne Carson    (estranged wife of talkshow host
                          L+H*   L-L%          L+H*   L- H*       *?
                     Johnny Carson), and she was among those who
                      H*         L-L%   L+H*     !H*   !H*
                     eulogised him.
                     H+!H*       L-L% 
[GIF}
EXAMPLE <<onions>>:  Okay now chop the onions... Now be careful.
                    H* L- H*           H+!H* H-   L*    L+H* L-H%
                     Okay,  chop the onions, and put them into that bowl.
		   L+H* L-L%        H+!H* H-               H*       H+!H* L-L%
[GIF}
[GIF}

Because speakers can vary their pitch ranges seemingly without limit
to convey discourse organization or degree of involvement, and because
downstep can happen many times within a single phrase, sometimes it is
difficult to tell whether a tone is H or L, even when one is sure a
tone is there.  The accents on "smoke" and "yeah" in the asides in
example utterance <<smoke>> illustrates this.  Or, in very reduced
pitch ranges, it can be difficult to tell whether a syllable is
accented or not.  Utterance <<sold3>> illustrates this.  In the
very reduced pitch range after the second downstep on "else", it is
impossible to know how many accents there are.

EXAMPLE <<smoke>>: Can I smoke?  <<interviewer says "You can smoke.">>
                          X*? H-H%
                   Yeah?  <<interviewer: "Does this door have to stay
                   open?" If it>> No, it doesn't have to be; you can close it.
EXAMPLE <<sold3>>: He sold it to somebody else, they bought the
                      H*         !H*      !H*         *?
                   whole company, and he made lots of money on
                           *?  -X?              *?     *?
                   the business...
                       H*     L-L%
[GIF}
[GIF}

In the first type of uncertainty, ToBI prescribes that the transcriber
use the notation X*? to simply mark the clear presence of an accent,
without forcing an arbitrary commitment to the accent's type.  Thus,
the accent on "smoke" should be transcribed as X*? rather than as L*
or H*.  (X*? should not be used to mark uncertainty between the L- and
H- phrase accents or between the L% and H% boundary tones.  There the
transcriber should instead mark X-? for the phrase tones and X%?  for
the boundary tones.)  In the second type of uncertainty, when the
transcriber is not certain even that there is a pitch accent (as, for
example, on the "bought" in utterance <<sold3>>), the mark *? should
be used instead of X*?. 

In addition to very compressed pitch ranges, there are several
particular tone sequences which are prone to inducing uncertainty
about the presence of accent. One such case is the downstepped H* !H*
!H* ... sequence just illustrated.  In many cases, words after the
first H* in such sequences are ambiguous between being accented with
!H* and being `deaccented' (i.e. being in the postnuclear low stretch
in a H* L- L% sequence).  This is not always ambiguous, however.
Utterance <<anna2>> illustrates a clear contrast between downstepped
and deaccented.

EXAMPLE <<anna2>>:     Anna married Lenny.  
in two productions 1) H*                L-L%
	           2) H*     !H*    !H* L-L%
[GIF}

Another case of inherently ambiguous tone sequences which occurs very
commonly is when there is a long stretch of speech in a `hat pattern'
contour.  Utterance <<peel>> illustrates this.  The word "off" sounds
very prominent, giving a strong subjective impression of accent.
However, because the word lies in the plateau between the first H* on
"peeled" and the nuclear H* pitch accent on "Hawaii", it is difficult
to tell whether "off" also bears a H* accent, or just the preceding
"peeled".  The word "host" in the phrase "talkshow host Johnny Carson"
in <<capote>>, the word "married" in the first production of <<anna>>,
and the word "Mother" in the first production in <<theresa>> in
Section 2.7 above are three more illustrations of this very common
ambiguity.  The first production in example utterance <<made4>> (given
in Section 1.4 above) illustrates the analogous situation with a L*;
"Marianna" probably has a L* accent (note the dip down into it from
the mid pitch level that begins the sentence) and "marmalade" clearly
has a L* accent, but what about the "made" in between?

EXAMPLE <<peel>>: [Ever since the roof of a 19-year old Aloha Airlines
                  Boeing 737] peeled off over Hawaii last April, ...
			       H*    *?         H* L-    H* L-L% 
(See example utterance <<older-aircraft>> in the next PRACTICE for the
whole context of this phrase.)
[GIF}

In cases such as these it is better to err on the side of conservatism
and mark the word with *? or nothing.  In particular, the transcriber
should take care not to let grammatical expectations guide the marking
of accents.  If we find ourselves giving in to such thoughts as "This
is a content word and therefore probably is accented", we preclude the
use of our transcriptions to test whether content words are indeed
likely to be accented.

A final source of uncertainty is particularly true of transcribing
sentences in isolation extracted from the context in which they
originally occurred.  This is uncertainty about accent type due to
unfamiliarity with a particular speaker's normal speaking range for a
particular style of speech.  For example in utterance <<hurt>>, the
nuclear pitch accent on "hurt" is probably L*; the pitch is lower than
the "neutral" value at the beginning of the utterance.  However, 200
Hz is very high for a low, and unless one knows from experience that
this speaker has a very high-pitched voice, one might be tempted to
transcribe this utterance with a H* nuclear accent.
 
EXAMPLE <<hurt>>: But would it hurt you?
                        *?      X*?    H-H%
[GIF}

Utterance <<beef>> also illustrates this point.  Here we have
transcribed each of the accents in the second speaker's response as
L*, since we know from many other examples in this labelling guide
that this speaker's normal range is higher than the first speaker's
voice.  This utterance also exemplifies an intonation pattern we have
not shown before: a sequence of all low tones, for all of the accents,
the phrase accent, and the boundary tone.

EAMPLE <<beef>>: Here's your Chateaubriand, ma'am.
                  H*                L+H* L-  L* L-H% 
                 I don't eat beef.
                    L*    L*   L* L-L%


2.10. When something is accented that you would not expect

It is important also to not fall into the obverse reasoning and
hesitate to mark an accent simply because the accented word is not a
content word.  Example utterance <<AND1>> is a nice illustration of
this point.  Here, the pitch pattern unequivocally supports a nuclear
`peak' accent on the "and".  There is a suggestion of accent on the
same word in <<hennessy>>, but less clearly; here the glottalization
at the beginning of the word is the main cue that it is accented
(recall that stressed vowel-initial syllables are set off by glottal
stops -- Section 2.1).

EXAMPLE <<AND1>>: ...design improvements, and a schedule...
		       H*      H*    L-H% H* L-   H*  L-L%
[GIF}
               
EXAMPLE <<hennessy>>: Hennessy is widely respected   for his legal
                       H*    L-     H*      !H*  L-L%       L+H* !H-
                      scholarship and his administrative abilities.
                         H*     H- H*             *?        H*    L-L%
[GIF}

Another thing to watch out for is that in very emphatic speech, a word
with two fairly strong syllables can bear two pitch accents.
Utterance <<understand>> is such a case, where our normal expectation
is that only the most stressed final syllable in "understand" will be
accented.  Here, however, the first syllable is also accented, so that
the phrase "to understand" can realize the `surprise-redundancy'
contour L* H* L- L%.

EXAMPLE <<understand>>: I'm simply trying to get you to understand.
                        H*   L* H-   L*   H- L*      H- L*     H* L-L%
[GIF}

Example <<philadelphia>> gives another case of such double accents,
apparently without the impetus of realizing any particular
intonational idiom.
 
EXAMPLE <<philadelphia>>: from Philadelphia to Dallas
                              L+H* !H*   L-    H*   L-L%
[GIF}

This phenomenon of double accents, and the apparently related
phenomenon of `stress shift', have been examined extensively by
Stefanie Shattuck-Hufnagel and her colleagues (see Ross, Ostendorf, &
Shattuck-Hufnagel, 1992) in a corpus of newscasts which they have
transcribed using something like the ToBI system.  Some of the
exercises involving this phenomenon are from this study.  It is
apparently fairly common.


***************************************************************** 
PRACTICE FOUR -- uncertainty about pitch accent placement or type
***************************************************************** 
Transcribe these exercises using the exercises script.

_______________________________________________________________________
EASY: 
EXERCISE <<legumes3>>:  Legumes are a good source of vitamins,
                        and so are greens.
     [Repeated exercise from PRACTICE TWO.  Transcribe the second part,
      after "vitamins".] 
EXERCISE <<sold1>>:           He sold the business to somebody else 
EXERCISE <<artwork>>: 	      State law now requires public
			      construction projects to set aside 1% of
                              their budgets for artwork.
     [Concentrate particularly on the part from "to set aside" on.]  
EXERCISE <<howto>>:           I know we've gotta do it but I don't know how 
			      to do it.
     [There isn't an intermediate phrase break between "how" and "to".
      You'll learn how to transcribe sequences such as this in Section 3.]   
EXERCISE <<noodle4>>:         Do you have a lean mini-noodle dish?  
EXERCISE <<noodle5>>:         D'you have a lean mini-noodle dish?
_______________________________________________________________________
INTERMEDIATE: 
EXERCISE <<butcher>>: 	How'd your operation go?  Don't talk to me
			about it; I'd like to strangle the butchers.  
EXERCISE <<physicist>>: He's a physicist 'n works at NASA.  
EXERCISE <<environ1>>: And Ballaga seems determined to stay the
                       environmental course.  
EXERCISE <<environ2>>: plenty of room to flex environmental muscles
     [We had trouble on the accents around "environmental" too, so don't
      agonize over their type.]  
EXERCISE <<older-aircraft>>: Ever since the roof of a 19-year old Aloha
                             Airlines Boeing 737 peeled off over Hawaii last
                             April, sweeping a flight attendant to her death,
                             attention has been focused on the older aircraft.
     [Don't agonize too much over the "Ever since the roof" part.  We found
      the tonal analysis really hard there too.]
_______________________________________________________________________
DIFFICULT: 
EXERCISE <<spanish>>: 	And I had registered for Spanish, simply
			because I'd taken it for five years in high school.  
EXERCISE <<two-million>>: I'll buy it back from you for like two million,
           because ya done ran it into the ground, you're having problems,
           like you... you're not gonna make it, and you go bankrupt, so
           I'm gonna buy it back from you for like, for next to nothing.
EXERCISE <<sold2>>: 'coz he was like, a millionaire
     [Don't worry about the type of the boundary (if there is one)
      after "like".] 
EXERCISE <<massachusetts2>>: Hewlett-Packard has announced it's buying
                             Massachusetts-based Apollo computer.
 

2.11. The point of highest f0

The only other label on the tone tier which we have not discussed is
the transcription of HiF0, the point of highest fundamental frequency
associated with a pitch accent within an intermediate phrase.  HiF0 is
used currently as a rough measure of the phrase's pitch range.  It is
transcribed only for intermediate phrases in which there is an accent
with a H component -- i.e. a H*, L+H*, L*+H, or H+!H* accent.  Thus,
for example, in <<made4>> (given above) one should not transcribe
HiF0.

The summary statement in the Annotation Conventions (Appendix A) offers the 
following advice about HiF0:

  Transcribers should take reasonable care to choose a point in
  time that reflects the target of the H for the accent.  In several
  cases this will mean choosing some point other than the actual f0
  maximum.  For example, sometimes the highest f0 value in an
  accented syllable reflects the `intrinsic' effect of a voiceless
  consonant and will thus be a poor estimate of the speaker's choice
  of pitch range.  More seriously, in a phrase where the highest
  accent-related f0 occurs in a H* H- H% sequence, choosing the
  absolutely highest value for HiF0 will artifactually inflate the
  pitch range estimate by the amount of the upstep on the H%.  In
  such cases, we recommend that the syllable's amplitude
  contour be used to pinpoint HiF0 within the candidate region.  

For an example, see <<good1>> from PRACTICE TWO.  HiF0 would be at the
amplitude peak for "good" at about time 2.61.


labelling_guide_v2.ASCII (augmented by some HTML)


This page is maintained by M. Beckman (mbeckman@ling.osu.edu)

 

Copyright © 2008 Department of Linguistics, The Ohio State University
Questions? see our Contacts page.
To report problems with this web site, contact webmaster@ling.ohio-state.edu
Global Hits: 14543581