Term Project, Part 3, Relating the perception and production data
Ling H286, Autumn 2007, Ohio State University)
Copyright © 2007 by the class and Mary E. Beckman
0. Synopsis and due date.
This part of the term project ties together the first two parts.
You will be using analysis methods we have learned more recently
to relate the perception experiment results
for the hod and hawed
words that you analyzed in the first part of the term project to
the durations and vowel formant measures data that you analyzed
in the second part of the term project.
You will also analyze the perception and production results for these
same two words produced by the 139 Detroit speakers in the Hillenbrand
et al. (1995) corpus, as a kind of check on what to expect from
a more homogeneous group of talkers than is represented by the 20
members of the class.
Your group presentation will be in class during the time scheduled
for a final exam (7:30-9:30 on Tuesday, December 4), and your personal final
summary of the project as a whole is due during that same final class meeting.
1. The measurement files.
The perception test results are in the
Au2007perception.txt file
in the termProjectPart1Results subdirectory under
the course web page, where you got them for the first interim report.
The production measures have been collated into a single large file
called classVowels.txt
that is in the termProjectPart3 subdirectory of
the course web page. This file includes the vowel durations that
Group Awesome measured for all 10 vowels produced by all 20 class
members, and the corrected formant measures that Group A made for
the vowels in hod and hawed for everyone. There are also
some corrections of other vowels for some people that Mary made after
looking at each class member's vowel chart and seeing numbers that were
obvious mistrackings. You can see which values were changed by looking
at the code in projectPart3prep.R"
in the termProjectPart3 subdirectory.
This file contains the R code that was used to collate the vowel
formant measures and duration measures.
The perception test results and the vowel formant measures
for the 139 Detroiters are still in the
HillenbrandHighVowels
subdirectory of the class web site, where you got them in order to do
the data analyses for report number 4. The relevant files to download
from here are
bigdata.txt
(the file of vowel formant and vowel duration measurements)
iddataMinusAsterisk.txt
(the file of identification test results).
2. The questions.
Here are the questions about these two data sets
that we came up with in class on Monday, November 19.
In our first analysis of the perception data, we assumed
a categorical distinction between two sets of people in
the class. If we were to treat the Detroiters the same
way, we would distinguish:
- Class members who distinguish.
- Class members who do not distinguish.
- Detroiters who distinguish.
- Detroiters who do not distinguish.
We defined sets (a) versus (b) by designating Joe as
our class "reporter" and having him interview everyone,
to determine how each class member self identifies.
The first thing to ask, then, is:
-
Since we don't have access to the 139 Detroiters to
ask them how they self-identify,
how can we define the two groups
for the 139 Detroiters?
What is a reasonable criterion value for mis-identification rates
that we can use to divide these talkers into an analogous set of
distinguishers versus non-distinguishers?
Once we have determined a criterion for dividing the Detroiters
into these two discrete categories, we might want to ask:
-
Is there a significant difference between F1 values for
hawed and F1 values for hod?
-
What about for F2 values?
-
What about for duration values?
-
Are any of the significant differences predictive?
That is, could we use the raw values for F1, F2, and/or duration
to predict with some confidence whether a token is a production
of hod or a production of hawed?
-
If we normalize for the vocal tract size differences and overall
rate differences among the talkers in each of the two groups,
are there "predictive" differences?
-
Do the data points for non-distinguishers tend to fall in
intermediate or ambiguous positions relative to the "dividing
lines" between hod and hawed on the graphs that
we make to see whether these variables are predictive?
The second set of questions looks at the relationship between
perception and production in a different way.
-
First, for each larger group (Detroiters versus class members),
is the difference in correct identification and in formant values
bimodal (as we'd expect if there were two clearly distinct groups),
or is it continuous?
-
If there are continua, what is the relationship between the
production continuum and the perception continuum?
There is some R code for reading in the data files and for doing various analyses
that might be relevant to answering the above questions in the files
classHodHawed.R and
HillenbrandHodHawed.R
in the termProjectPart3 subdirectory
under the course web site.
These files include sample code that you can use to make histograms and
do t-tests, if you think histograms and t-tests would be useful for
answer the questions about significant differences between distinguishers
and non-distinguishers.
They also include code for doing the same kind of
normalization to the minimum and maximum values that
Group Awesome did for the class formant values in their interim report 2.