Data analyis problem on "Vowel differences and speaker differences"

(Data analysis assignment 4 for Ling H286, Autumn 2007, Ohio State University)

Copyright © 2007 Mary E. Beckman

0. Due date (and a reminder about collaboration).

Do the data analysis described below and turn in your report at the beginning of class on Wednesday, November 14.

(You may work in groups to make the histograms and scatterplots and so on. However, if you do so, you must remember to acknowledge the contributions of others in your report. Also, the writing of your report on the data must be your own individual work.)


1. The primary data

The data that we will use for this assignment are from the study of Detroit dialect vowels by James Hillenbrand and colleagues that Peter Ladefoged cites in Chapter 5 of "Vowels and consonants" and uses in his Figure 5.7 on p. 46. Please note that the data are copyrighted by James Hillenbrand. The full data set is available from:

http://homepages.wmich.edu/~hillenbr/voweldata.html

and whenever you use the data, you should cite the following paper, in which the study was described:

James Hillenbrand, Laura A. Getty, Michael J. Clark, & Kimberlee Wheeler (1995). Acoustic characteristics of American English vowels. Journal of the Acoustical Society of America, 97, 3099-3111.

In this study, Hillenbrand et al. elicited 12 /h_d/ words from 139 speakers of a Northern Cities variety of American English. The words and the codes that the Hillenbrand et al. use for the target vowels are: ae="had", ah="hod" (the vowel in "cot"), aw="hawed", eh="head", er="heard", ey="hayed", ih="hid", iy="heed", oa="hoed" (/o/ as in "boat"), oo="hood",uh="hud", uw="who'd". They then measured formant values at several different places in the vowel, including near the vowel midpoint, as we are doing for the words that we recorded for the term project. They also played the audio files to a group of 20 listeners (also natives of the Detroit area, as were the talkers who produced the words), asking them to identify the word.

I have downloaded the file that contains the formant measurements to a subdirectory of our course web page called HillenbrandHighVowels and renamed it as bigdata.txt so that you can look at it on a Windows PC just by clicking on the icon. There is also a "massaged" copy of the identification data in the file iddataMinusAsterisk.txt in that directory. Download these and the file of R code there. That is download.

  1. bigdata.txt
  2. iddataMinusAsterisk.txt
  3. HillenbrandHighFrontVowels.R

2. Analyze the effects of vowel type versus speaker type.

Read the comments and code in the file HillenbrandHighFrontVowels.R and make the six histograms and two scatterplots that the code describes. These are:

  1. A histogram of the first formant in all 139 tokens of the vowel i in "heed" with a histogram of the F1 in the vowel I in "hid" overlaid.
  2. Like the first figure, but for the second formant.
  3. Like the first figure, but for vowel duration.
  4. Like the first figure, but with the F1 values separated by speaker sex instead of by vowel category.
  5. Like the second figure, but with the F2 values separated by speaker sex.
  6. Like the third figure, but with the vowel durations separated by speaker sex.
  7. Vowel space scatterplot with the two vowels plotted with different plotting characters ("i" for i and "I" for I) and with red circles drawn around the four tokens of the word "heed" that one listener misidentified as "hid".
  8. Vowel space scatterplot with males and females plotted with different plotting characters.
Also do the six t-tests that the code describes, changing the type of test (i.e., one-tailed or two-tailed) as appropriate, if you do not agree with the expectations that the code suggests.

3. Writing the report

Embed the two histograms for duration and the two scatterplots into your report and then write four short paragraphs addressing the following sets of issues and questions.

One or two of your paragraphs should refer to the relevant (set of) t-test(s) and you should embed the results of the t-test(s) there, using the following reporting style:

The red mice were on average 16 grams heavier than the grey mice, a difference that was highly significant by a one-tailed t-test (t=-9.657, p<0.001).