Data analyis problem on "Vowel inventories and vowel spaces"

(Data analysis assignment 3 for Ling H286, Autumn 2007, Ohio State University)

Copyright © 2007 Mary E. Beckman

0. Due date (and a reminder about collaboration).

Do the data analysis described below and turn in your report at the beginning of class on Wednesday, October 31.

(You may work in groups to analyze the vowel systems and to make the histogram and so on. However, if you do so, you must remember to acknowledge the contributions of others in your report. Also, the writing of your report on the data must be your own individual work.)


1. The primary data

The data that we will be using for this exercise are from Ian Maddieson's UCLA Phonological Segment Inventory Database (UPSID), which lists the vowels and consonants for 451 languages, as determined by looking at published descriptions of the languages. For example, here are the five vowels that UPSID lists for Spanish and Japanese:

Note that UPSID lists Spanish as having the triangular vowel system that Peter Ladefoged shows in Figure 5.4, but the high back vowel of Japanese is listed as an unrounded vowel, so that the data point is displaced a bit toward the left and away from the corner of the "nicely symmetrical triangular vowel space" that Peter Ladefoged says is "the most efficient way" to distribute a set of five vowels to make them as auditorily distinct as any five vowels can be" (Ladefoged, 2005, p. 37). We have made a data frame with this kind of list for each of the 80 UPSID languages that have five vowels, for each of the 25 UPSID languages that have four vowels, and for each of 23 UPSID languages that have three vowels. These data frames are in sub-directory of our class web page that is called dataFiles which also contains a fourth file that lists all of the UPSID languages and gives the consonant and vowel counts in each one.

Download all four of these files:

  1. vows5v.txt (the file for the 5-vowel languages)
  2. vows4v.txt (the file for the 4-vowel languages)
  3. vows3v.txt (the file for the 3-vowel languages)
  4. UPSIDlgs.txt (the file with the vowel and consonant counts for all 451 languages)
You should also download the file of R code in the dataFiles directory and look at the comments in Part 2b to see how to interpret the symbols for the different vowel types in vows5v.txt, vows4v.txt, and vows3v.txt.

2a. Analyze the vowel count data and make a histogram

Use the data in UPSIDlgs.txt to make a histogram showing the distribution of different sizes of vowel inventories of different sizes. Calculate the median size and the mean size of the vowel inventories. (You may find it useful to use the code in this file of R code to see what the R commands are for calculate means and medians, and to see how to make a histogram in R.

2b. Analyze the vowel spaces of the 3-, 4-, and 5-vowel languages

Then look at the lists of vowels for the 5-vowel languages, 4-vowel languages, and 3-vowel languages in the other three files. Use these files to determine the following numbers.

  1. The number of 3-vowel languages that have a perfectly triangular system with the vowels {a, i, u}.
  2. The number of 4-vowel languages that have these three vowels plus one other vowel.
  3. The number of 5-vowel languages that have a perfectly triangular system with the vowels {a, e, i, o, u}.

3. Writing the report

Embed the histogram that you made in Part 2a into your report and then write a short paragraph answering the following sets of questions about the count data.

Then referring to the histogram again, write a second short paragraph which adresses the following sets of questions:

Then using the numbers that you calculated in part 2b of the data analysis instructions, write a third paragraph that answers the following sets of questions.


4. Acknowledgments

The quotations from Peter Ladefoged are from his book:

Peter Ladefoged (2003) Vowels and Consonants, 2nd edition. Malden, MA: Blackwell.

The data for this analysis problem are from:

UPSID-PC. The UCLA Phonological Segment Inventory Database. Data on the phonological systems of 451 languages, with programs to access it, by Ian Maddieson and Kristin Precoda.

The UPSID-PC program was downloaded from:

http://www.linguistics.ucla.edu/faciliti/sales/software.htm

It is an MS-DOS program for accessing the database of languages' phoneme inventories that appeared in print in:

Ian Maddieson (1984). Patterns of Sounds. Cambridge University Press.

See the references in the book or in the individual language files in UPSID-PC for the sources of information on the specific languages.