Marie-Catherine de Marneffe & Micha Elsner

LING 5050 - Technical tools for linguists

Maysession 2016

Homework 1

DUE: Friday May 20, 2016 (no late homework accepted!)

We want to address the question of whether women talk more than men. To answer this, we will use the Fisher corpus on Carmen.

Write a python script that outputs

  • the raw total number of words spoken by women
  • the raw total number of words spoken by men
  • the total number of utterances spoken by women
  • the total number of utterance spoken by men
  • the average number of words per utterance spoken by women and by men
  • the number of female speakers
  • the number of male speakers
  1. The script should take into account all the directories and files in the Fisher corpus on Carmen.
  2. The user should be able to specify the Fisher directory path at the command line (e.g., python3 processFisher.py Fisher)
  3. Give the results you obtain, write a short paragraph that gives an answer to our question "Do women talk more than men?". Which numbers from your analysis are evidence that your answer is correct. Show in your answer that you understand how words have been defined in your script. This paragraph can be at the end of your script, as a comment.

You will submit your code on Carmen. Make sure your code runs. Make sure to appropriately comment your code! Find the right balance in your comments: too few or too many isn't helpful.