#################################################################### # countUPSIDvowels.R # # From "Notes on Probability and Statistics for Analyzing the # Sounds of Languages" -- a companion textbook to Peter Ladefoged's # "Vowels and Consonants: An Introduction to the Sounds of Languages" # 2nd edition (Blackwell, 2005). # # (c) 2006, Grant McGuire and Mary E. Beckman # Department of Linguistics, Ohio State University # # File of R code for doing some analyses of the UPSID vowel counts # in conjunction with a reading of Ladefoged's Chapter 4 on "The # Sounds of Vowels". # # References and acknowledgments # # The data for this analysis problem are from: # UPSID-PC. The UCLA Phonological Segment Inventory Database. # Data on the phonological systems of 451 languages, with # programs to access it, by Ian Maddieson and Kristin Precoda. # The UPSID-PC program was downloaded from: # http://www.linguistics.ucla.edu/faciliti/sales/software.htm # It is an MS-DOS program for accessing the database of languages' # phoneme inventories that appeared in print in: # Ian Maddieson (1984). "Patterns of Sounds." Cambridge # University Press. # See the references in the book or in the individual language # files in UPSID-PC for the sources of information on the specific # languages. # # Part 1 -- In preparation for doing the counting .... # # Download the files vows5v.txt, vows4v.txt, vows3.txt and also, # download the file UPSIDlgs.txt, which contains the data, and set # the working directory to the folder where you have stored these # data. For example, on my machine, this is what I would do: setwd('c:/Lx286/dataAnalysisReports/reportNo3') #################################################################### # Part 2a -- calculating the mean and median number of vowels, and # making the histogram. # # First read in the data frame that shows how many vowels and how # many consonants each UPSID language has. lgs=read.table("UPSIDlgs.txt",header=T) # Here is the command for calculating the mean number of vowels in # the 451 UPSID languages ... mean(lgs$noVow) # and the median number of vowels. median(lgs$noVow) # To make the histogram, first set up a plotting window. (This part # is platform specific. It is also optional, for esthetics.) windows(height=3.5,width=6.5,pointsize=10) par(mar=c(3,2.8,0.1,0.1),mgp=c(1.8,0.5,0),family="serif") # Now use the hist() command to draw the histogram. You can set # a variable called noVow.hist to be the data structure that this # command returns, like this: noVow.hist=hist(lgs$noVows,breaks=seq(0.5,45.5,1),main="",col="gray", xlab="number of vowels in inventory",ylab="number of languages", ylim=c(0,85),xlim=c(1,44)) # The following command is optional, for esthetics. It draws a box # around your histogram. box() # The data structure that hist() returns has elements "mids" (for the # middle value for each bin) and "counts" (for the number of elements # in each bin). You can access these elements by name, and then use # them to add numbers for the heights of the bars, like this: text(noVow.hist$mid,noVow.hist$counts+3,noVow.hist$counts) # You may find it useful to do this, in order to be able to see easily # from the graph how many languages have exactly five vowels. # Alternatively, you can use commands like this to calculate the # number of languages which have exactly five vowels, or the number # which have exactly four vowels. length(which(lgs$noVow==5))[1] length(which(lgs$noVow==4))[1] #################################################################### # Part 2b -- looking at the distribution of vowel chart types for # languages with five vowels, four vowels, and three vowels. # # Start by reading in the data for these subsets of the 451 languagues. v5=read.table("vows5v.txt",header=TRUE) v4=read.table("vows4v.txt",header=TRUE) v3=read.table("vows3v.txt",header=TRUE) # Here's how you could check to see what the first three rows look # like in the data frame for the five vowel systems. v5[1:3,] # You should see something like this on your R console: # # language v1 v2 v3 v4 v5 # 1 ABIPON a e i i- o # 2 AHTNA a e i o u # 3 AINU a e i o u # # This tells you that the languages Ahtna and Ainu have the five # vowels that Peter Ladefoged identifies for Spanish, etc., and # that Abipon has a, e, i, o, and then a high central unrounded # vowel instead of the higb back rounded vowel u that Spanish has. # Here is how you can see what the rows for Swedish and Japanese # look like in the v5 data frame: subset(v5, language=="SPANISH" | language=="JAPANESE") # You should see something like this on your R console: # # language v1 v2 v3 v4 v5 # 24 JAPANESE a e i o uu # 61 SPANISH a e i o u # # This tells you that Spanish is reported to have the five vowels # that Peter Ladefoged depicted for Spanish in Figure 5.4 on p. 44, # but Japanese is reported to have a high back unrounded vowel # rather than a high back rounded vowel. # Here is how to pick out all of the languages that are reported to # have the system that Ahtna, Ainu, and Spanish are reported to have: subset(v5, v1=="a" & v2=="e" & v3=="i" & v4=="o" & v5=="u") # and here is to embed that command in a command to count the columns # and rows: dim(subset(v5, v1=="a" & v2=="e" & v3=="i" & v4=="o" & v5=="u")) # You should see something like this on your R console: # # [1] 63 6 # # This tells you that of the 80 languages with five vowels, 63 are # reported to have the five vowels that Peter Ladefoged depicted for # Spanish in Figure 5.4 on p. 44, # Here's how you could check to see what the first three rows look # like in the data frame for the three vowel systems. v3[1:3,] # You should see something like this on your R console: # # language v1 v2 v3 # 1 ALABAMA a e o # 2 ALEUT a i u # 3 AMUESHA a e o # How would you figure out how many of the three vowel systems are the # triangular set { i, a, u } that Ladefoged says is the most efficient? # Here's how you could check to see what the all 24 rows look # like in the data frame for the four vowel systems. v4 # You should see something like this on your R console (with other # similar rows in between the 3rd and the 23rd): # # language v1 v2 v3 v4 # 1 ALAWA a e i uu # 2 BANDJALANG a e i u # 3 CAMPA a e i o # ... # 23 UPPERCHEHALIS a ax e o # 24 YESSAN-MAYO a A ax i- # 25 YUPIK a ax i u # # In these rows, ax stands for the mid central unrounded vowel that # is the more common variant of the vowel that Peter Ladefoged calls # "the most common vowel in American (and British) English" (2005, # p. 29); A stands for the low front unrounded vowel in the word ; # uu is the same high back unrounded vowel that we saw in Japanese; # and i- is the same high central unrounded vowel Abipon has.