Data analyis problem on "Pitch patterns and time plots"

(Data analysis assignment 2 for Ling H286, Autumn 2007, Ohio State University)

Copyright © 2007 Mary E. Beckman

0. Due date (and a reminder about collaboration).

Do the data analysis described below and turn in you report at the beginning of class on Wednesday, October 10.

(You may work in groups to make the measurements and to figure out how to make the time plots. However, if you did so, you must remember to acknowledge the contributions of others in your report. Also, the writing of your report on the data must be your own individual work.)

1. The primary data

In the waveFiles directory you will find a collection of Praat objects called WordsSentencesH286report2.Collection which you should download and read into Praat. This file contains eight Sound objects and associated TextGrid objects. Five of these Sound objects are recordings of a set of four words pronounced by five different speakers of Standard (Mandarin) Chinese, a boy, two men, and two women. (There is also a Manipulation object and a resynthesized Sound file for the word set produced by the boy.) Another two Sound objects are recordings of two more sets of four words each, pronounced by the second of the two Chinese women. The last Sound object is a recording of a speaker of English pronouncing four different one-word sentences of English. The first is a statement (a possible answer to the question "Who made these sandwiches for us?"). The second is a yes-no question (perhaps asking if your mother was the one who made the sandwiches). The third is a statement answering the question "What color is this? and the fourth is a question wondering whether something is colored orange. Download this collection and read it into Praat. Each of the TextGrid objects has three tiers. The first tier marks off an interval for each word (or sentence), and the second tier marks off an interval for each vowel or consonant. The third tier either has a set of 12 measurement points (for the Words1ChineseBoy Sound object) or is empty and waiting for you to insert three measurement points for each of the words or sentences. Download this Praat collection to your machine and open it in Praat. Also download the table file that contains the sample measurements for the Sound Words1ChineseBoy and associated TextGrid object.

2. Analyze the data and make time plots

Highlight the Sound Words1ChineseBoy together with the associated TextGrid object, and open these two objects together in an Edit window. Make sure that the Pitch track is visible in the Edit window. Also highlight the Manipulation object Words1ChineseBoy and open it in another Edit window. Make sure that the "Group" box in the lower right corner of each of the three Edit windows is checked, so that the times for the three windows are linked as you move around in any of them. Highlight and zoom in on each of the four words in turn, inspecting the Pitch points in the Manipulation object (which should correspond to the points in the "measure" tier in the TextGrid object). Listen to the original Sound and also the sound of the word played from within Edit window for the Manipulation object. Do you agree that the three measurement points marked for each word in the measure tier are the right places to sample the Pitch track in order to recreate the same pitch pattern as in the original denser sampling of points? If not, edit the Manipulation object and TextGrid and move one or more of the Pitch points in the Manipulation object and the associated measurement point(s) in the TextGrid object. Then measure the Pitch value at each of the three points in each of the four words, and if you have changed any point, edit the report2measurements.txt table to reflect your change(s).

After you have understood the relationship among these objects, analyze the six other Sound objects for the Chinese speakers' productions and extend the report2measurements.txt table by adding 24 more rows, with points for each word in these files. To get those rows, do the following.

For each of the other recordings of the Chinese speakers, select the Sound file and make a Manipulation object. Open the Manipulation object in an edit window and also open the Sound together with the associated TextGrid, as you did for the Words1ChineseBoy Sound and associated objects. Use the "Stylize pitch..." command in the Pitch pulldown menu in the Manipulation object Edit window to reduce the number of time points sampled, and then adjust by inserting or deleting points to get exactly 3 points per word, following the example of the distribution of points in the Word1ChineseBoy TextGrid object. (This is the tier called "measure" because the points marked here are where you will make your measurements.) Click on each time point in turn, and record the pitch that Praat returns at that time point. If there is no pitch track visible at a point to return a Pitch value, look at the waveform and measure the duration of the nearest "repeating section" in the file, to calculate the length of this "period" and then calculate its inverse to estimate the pitch value there. Then recalculate the Manipulation object using a floor that is lower than your estimate, to see if that helps. (You may find it helpful to play with the pitch settings in any case, remembering that the men's voices are lower in pitch than the women's and the boy's pitch is highest. Try setting the Pitch range for the men between 50 and 250 Hz, the Pitch range for the women between 100 and 350, and the Pitch range for the boy between 150 and 450 Hz.)

When you have finished making the complete table, make a time plot for each word, where you plot the three pitch values against the order of the observations -- i.e. the first pitch value at point 1 along the x-axis in the plot, the second pitch value at point 2, and the third at point 3. Arrange these time plots into a table with seven rows (for the 7 different recordings) and four columns (for the sequence of words in each recording). There is sample R code for making the first time plot for the first row of this table in the timePlots.R script in our scripts directory. The result should look like the following jpg file.

If you're ambitious about learning R, you can even build an entire row at once, resulting in something that looks like this.

Once you have finished this table, try to do the same thing as you did for the Chinese words, but this time for the four one-word sentences of English. That is, try to retain exactly three points per sentence and choose points that make each sentence in the Manipulation object sound most like the original Sound object. Make another row of figures for these four one-word sentences.

3. Writing the report

Embed the figures you made in Part 2 into your report so that they make a table where the rows are the different Sound objects and the columns are the sequence of words or sentences. That is, the first row will be a set of four time plots like the second jpg picture above the Chinese boy, the second row will be the sequence of time plots for the first Man, and so on. Write three short paragraphs answering the following sets of questions about the time plots for the Chinese speakers in the first seven rows of the table.

Now look at the time plots for the English sentences in the last row of the table, and write a fourth pararaph answering the following questions.

4. Acknowledgments

The Sound files are taken from various sources, including recordings made by Fangfang Li, Ohio State University, and Peter Ladefoged's book Vowels and Consonants. The sound files for the second woman's voice are from the wonderful free CEDICT dictionary originally started by Paul Denisowski, downloaded via the convenient search interface at the On-line Chinese Tools web page at http://www.mandarintools.com/