# sortMajors.R # # (c) 2007 by Mary E. Beckman for Linguistics H286 # # Script using the file majors.txt that shows how to do various # things that would be useful for gathering the primary data and # making the histograms for Data Analysis Problem 1 on "Counting # words of different lengths" # Any line that has a hashmark (#) at the beginning is a comment. # Most of these are explanations of the R code in the neighboring # lines. A good strategy is to read the comments around the R # code and then experiment with running different parts of the # code, to see what each part does. For example .... ### # Start by setting the working directory to the folder where you # have stored the file majors.txt that we made in class on Wednesday # September 26. On Mary's laptop this folder is on the C: drive in # a folder called Lx286, so she would use the following command: setwd('c:/Lx286') # The syntax of this line is the simplest possible. It uses the # command setwd() using the argument 'c:/Lx286' to name the "path" # to the folder. ### # Next read the file majors.txt and assign the contents to a # variable called "majors" using this next command. majors=read.table("majors.txt") # This command has two parts. The part to the right of the # assignment operator (=) uses the command read.table() with the # argument "majors.txt", which is the name of the file to be read # in as a table. The part to the left of the assignment operator # specifies the name of the variable which then stands in for the # table that you read in. Try running just this part: read.table("majors.txt") # to compare the effects of running the command without assigning # the results to anything. Also, no try typing just: majors # to see what you did in the first command. ### # Use the following command to see how big the table is. dim(majors) # You should get something that looks like this on your R console: # [1] 27 1 # This tells you that the table majors has 27 rows and 1 column. # You can assign a name to the column using the names() command: names(majors)="major" # You can now refer to the column either by its number, like this: majors[,1] # or by its name, like this: majors[,"major"] # And you can refer to rows similarly, using the number(s) of the # row(s), like this (for the first 3 rows): majors[1:3,] # You should get something that looks like this on your R console: # [1] performance vocalmusiceducation # [3] instrumentalmusiceduction # 14 Levels: anthropology Arabic Chinese German ... vocalmusiceducation ### # This next command sorts the column. sort(majors[,1]) # You can also overwrite the original column, by assigning the # sorted values to it, like this: majors[,1]=sort(majors[,1]) # This makes it easier to count the tokens of each type. # You can figure out what the different types are by asking # what "levels" the variable has, like this: levels(majors[,"major"]) # This next command counts the tokens of the different types: summary(majors[,"major"]) # You can assign the result of that command to a new variable # called majorTypes like this: majorTypes=summary(majors[,"major"]) # or if you prefer to have them in sorted order, like this: majorTypes=sort(summary(majors[,"major"])) # And then you can make a bar plot, like this: barplot(majorTypes)