The Nationwide Speech Project Corpus

Purpose
The Nationwide Speech Project (NSP) corpus is a corpus of spoken language containing recordings of young male and female talkers from six regions of the United States. Speech samples include isolated words, sentences, passages, and interview speech. The purpose of the Nationwide Speech Project was to develop a corpus of spoken language that can be used in acoustic and perceptual studies of regional dialect variation in the United States (Clopper & Pisoni, 2006).

If you are interested in obtaining speech samples from the NSP corpus for use in acoustic, perceptual, or pedagogical projects, please contact Cynthia Clopper. Some of the materials are also available through the Linguistic Data Consortium. Please note that not all of the materials are currently available for distribution. Decisions regarding distribution will be made on a case by case basis.


Features of the NSP Corpus


Talkers
The talkers included in the NSP corpus were five male and five female lifetime residents of six dialect regions in the United States: New England, Mid-Atlantic, North, Midland, South, and West (see Figure 1). These regions are based on the dialect regions described in Labov, Ash, and Boberg's (2006) Atlas of North American English.


Figure 1. Map of the hometowns of the NSP talkers. Dark dots indicate male talkers. Light dots indicate female talkers.

Apart from gender and regional dialect, the talkers in the NSP corpus were fairly homogeneous. Table 1 provides demographic information about the 60 talkers.

Age 18-25 years old
Native Language English
Mother's Native Language English
Father's Native Language English
History of Hearing or Speech Disorder None
Race/Ethnicity White
Table 1. Demographic characteristics of the NSP talkers.


Materials
A range of speech materials was obtained from each talker, including isolated words, sentences, passages, and interview speech. Examples of the materials collected for the NSP corpus are shown in Table 2.

Materials Set Examples
hVd Words (N=10) heed, hid, head
CVC Words (N=76) mice, dome, bait
Multisyllabic Words (N=112)
(from Carter & Clopper, 2002)
alfalfa, nectarine
High Predictability Sentences (N=102)
(from Kalikow, Stevens, & Elliott, 1977)
Ruth had a necklace of glass beads.
The swimmer dove into the pool.
Low Predictability Sentences (N=52)
(from Kalikow, Stevens, & Elliott, 1977)
Tom has been discussing the beads.
She might consider the pool.
Anomalous Sentences (N=52)
(see Clopper et al., 2002)
Bill knew a can of maple beads.
The jar swept up the pool.
Passages (N=2) Rainbow Passage (Fairbanks, 1940)
Goldilocks Passage (Stockwell, 2002)
Interview Speech (5 minutes) hometown, hobbies, travel experiences
Targeted Interview Speech
(N=10 target words)
sleep, shoes, math
Table 2. Examples of the speech materials collected from each talker in the NSP.


Recording Conditions
All of the recordings were made in a sound-attenuated booth. Using homegrown software, the utterances were recorded in individual .aiff sound files on a Macintosh laptop at a sampling rate of 44.1 kHz with 16-bit encoding.


Acoustic Vowel Data
The acoustic vowel data summarized by Clopper, Pisoni, and de Jong (2005) is available here.


References

Carter, A. K., & Clopper, C. G. (2002). Prosodic effects on word reduction. Language and Speech, 45, 321-353.

Clopper, C. G., Carter, A. K., Dillon, C. M., Hernandez, L. R., Pisoni, D. B., Clarke, C. M., Harnsberger, J. D., & Herman, R. (2002). The Indiana Speech Project: An overview of the development of a multi-talker multi-dialect speech corpus. Research on Spoken Language Processing Progress Report No. 25 (pp. 367-380). Bloomington, IN: Speech Research Laboratory, Indiana University.

Clopper, C. G., & Pisoni, D. B. (2006). The Nationwide Speech Project: A new corpus of American English dialects. Speech Communication, 48, 633-644.

Clopper, C. G., Pisoni, D. B., & de Jong, K. (2005). Acoustic characteristics of the vowel systems of six regional varieties of American English. Journal of the Acoustical Society of America, 118, 1661-1676.

Fairbanks, G. (1940). Voice and Articulation Drillbook. New York: Harper.

Kalikow, D. N., Stevens, K. N., & Elliott, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. Journal of the Acoustical Society of America, 61, 1337-1351.

Labov, W., Ash, S., & Boberg, C. (2006). Atlas of North American English. Berlin: Mouton de Gruyter.

Stockwell, P. (2002). Sociolinguistics: A Resource Book for Students. London: Routledge.


Links to Related Pages
Lay Language Paper on the NSP for the Acoustical Society of America
The Do You Speak American? Project at PBS
Linguistic Atlas Projects in the United States
Speech Accent Archive at George Mason University
International Dialects of English Archive at the University of Kansas


Contact Information
For more information about the NSP or to obtain materials from the NSP corpus, please contact Cynthia Clopper (clopper.1 AT osu.edu).