Exercise 6

First part:

Write the word counting program with parallel lists. Your program should read in all of alice.txt and compute the count for each word.

Print the frequency of the following words: “Alice”, “cassowary”, “roses”, “rabbit”, “mirror”.

If not doing anything fancy with punctuation and case (in other words, you are just splitting each line of the text on white spaces), you should get the following answers:

Frequency of Alice : 221

Frequency of cassowary : 0

Frequency of roses : 1

Frequency of rabbit : 1

Frequency of mirror : 0

Second part:

Read the "efficiency interlude" in the notes.

Third part:

For additional practice, try some of the following:

  1. Write the "max" function (this is a Python builtin... but write it anyway). max([1, 3, 4, 0, 2]) should be 4.
  2. Write a function that finds the index of a sublist in a longer list. For instance, findSublist([1, 2, 3], [-1, 0, 1, 2, 3, 4, 5]) should return 2.
  3. Write a function that finds the longest sublist shared by two lists. For instance, longestSublist(["cassowary", "emu", "rhea"], ["ostrich", "cassowary", "emu", "kiwi"]) should return ["cassowary", "emu"]. This function is used in many computational morphology applications, since it's a good starting point for finding shared roots or affixes.
  4. Find all the words in alice.txt which appear both capitalized and uncapitalized. For which of them is the ratio of counts greatest?