# wordLengths.R # # (c) 2007 by Mary E. Beckman for Linguistics H286 # # Script for making sample histograms of word lengths for data analysis # report on "Counting words of different lengths." # Any line that has a hashmark (#) at the beginning is a comment. # Most of these are explanations of the R code in the neighboring # lines. A good strategy is to read the comments around the R # code and then experiment with running different parts of the # code, to see what each part does. For example .... # Start by setting the working directory to the folder where you have # stored the file wordsInClassAgroup.txt (if you are in the A group) # or wordsInClassStupendous.txt (if you are in group Stupendous) and # so on. These files are on our class web page in the directory # http://ling.osu.edu/~mbeckman/Lx286/scripts # On Mary's laptop, this folder is called Lx286 and it is a subdirectory # of her C: drive, so she would use the following command: setwd('C:/Lx286') # The syntax of this line is the simplest possible. It uses the # command setwd() using the argument 'c:/Lx286' to name the "path" # to the folder. ### # Next read the file for your group and assign the contents to a # variable called "words" using this next command. words=read.table("wordsInClassStupendous.txt", header=T, sep="\t") # This command has two parts. The part to the right of the # assignment operator (=) uses the command read.table() with the # argument "wordsInClassAgroup.txt", which is the name of the file # to be read in as a table. (The other two arguments tell R that # there is a "header" row and that the columns of data are separated # by tabs rather than by spaces.) The part before the assignment # operator specifies the name of the variable which then stands in # for the table that you read in. Try running just this part: read.table("wordsInClassStupendous.txt", header=T, sep="\t") # to compare the effects of running the command without assigning # the results to anything. Also, no try typing just: words # to see what you did in the first command. ### # Use the following command to see how big the table is. dim(words) # You should get something that looks like this on your R console: # [1] 29 2 # This tells you that the table words has 29 rows and 2 columns. # Use the following command to see what the names of the columns are: names(words) # You should get something that looks like this on your R console: # [1] "word" "noSyls" # This tells you that the 2 columns are called "word" (for the word's # spelled form) and "noSyls" (for the number of syllables that group # Stupendous assigned to each word). # You can now refer to the column either by its number, like this: words[,1] # or by its name, like this: words[,"word"] # And you can refer to rows similarly, using the number(s) of the # row(s), like this (for the first 3 rows): words[1:3,] # You can use these commands to see how many syllables there are in # the shortest and longest words: min(words[,"noSyls"]) max(words[,"noSyls"]) # You should see that the shortest are 1 and the longest are 4. # You can now count the number of words that have each number of # syllables, like this: summary(as.factor(words[,"noSyls"])) # You should see something like this, which says that there are # 6 words with length 1 syllable, 8 with length 2 syllables, and # so on: # 1 2 3 4 # 6 8 11 4 # You can assign this summary to a variable called sylCount, like # this: sylCount=summary(as.factor(words[,"noSyls"])) # And then you can now make your histogram, like this: barplot(sylCount) # This draws a pretty box around the graph: box()