The Nationwide Speech Project

The Nationwide Speech Project Corpus

Purpose
The Nationwide Speech Project (NSP) corpus is a corpus of spoken language containing recordings of young male and female talkers from six regions of the United States. Speech samples include isolated words, sentences, passages, and interview speech. The purpose of the Nationwide Speech Project was to develop a corpus of spoken language that can be used in acoustic and perceptual studies of regional dialect variation in the United States (Clopper & Pisoni, 2006).

If you are interested in obtaining speech samples from the NSP corpus for use in acoustic, perceptual, or pedagogical projects, please contact Cynthia Clopper. Some of the materials are also available through the Linguistic Data Consortium. Please note that not all of the materials are currently available for distribution. Decisions regarding distribution will be made on a case by case basis.

Features of the NSP Corpus

The corpus includes speech from 60 talkers representing six regional varieties of American English.

Talkers are balanced for gender and regional dialect, with five males and five females from each region (see Figure 1).

Talkers are relatively homogeneous with respect to other demographic variables, including age, ethnicity, linguistic experience, level of education, and socioeconomic status (see Table 1).

Speech materials from each talker include isolated words, sentences, passages, and interview speech (see Table 2).

High-quality digital recordings were obtained from each talker in a sound-attenuated booth.

Talkers
The talkers included in the NSP corpus were five male and five female lifetime residents of six dialect regions in the United States: New England, Mid-Atlantic, North, Midland, South, and West (see Figure 1). These regions are based on the dialect regions described in Labov, Ash, and Boberg's (2006) Atlas of North American English.

Figure 1. Map of the hometowns of the NSP talkers. Dark dots indicate male talkers. Light dots indicate female talkers.

Apart from gender and regional dialect, the talkers in the NSP corpus were fairly homogeneous. Table 1 provides demographic information about the 60 talkers.

Age 18-25 years old

Native Language English

Mother's Native Language English

Father's Native Language English

History of Hearing or Speech Disorder None

Race/Ethnicity White

Table 1. Demographic characteristics of the NSP talkers.

Materials
A range of speech materials was obtained from each talker, including isolated words, sentences, passages, and interview speech. Examples of the materials collected for the NSP corpus are shown in Table 2.

Materials Set Examples

hVd Words (N=10) heed, hid, head

CVC Words (N=76) mice, dome, bait

Multisyllabic Words (N=112)
(from Carter & Clopper, 2002) alfalfa, nectarine

High Predictability Sentences (N=102)
(from Kalikow, Stevens, & Elliott, 1977) Ruth had a necklace of glass beads.
The swimmer dove into the pool.

Low Predictability Sentences (N=52)
(from Kalikow, Stevens, & Elliott, 1977) Tom has been discussing the beads.
She might consider the pool.

Anomalous Sentences (N=52)
(see Clopper et al., 2002) Bill knew a can of maple beads.
The jar swept up the pool.

Passages (N=2) Rainbow Passage (Fairbanks, 1940)
Goldilocks Passage (Stockwell, 2002)

Interview Speech (5 minutes) hometown, hobbies, travel experiences

Targeted Interview Speech
(N=10 target words) sleep, shoes, math

Table 2. Examples of the speech materials collected from each talker in the NSP.

Recording Conditions
All of the recordings were made in a sound-attenuated booth. Using homegrown software, the utterances were recorded in individual .aiff sound files on a Macintosh laptop at a sampling rate of 44.1 kHz with 16-bit encoding.

Acoustic Vowel Data
The acoustic vowel data summarized by Clopper, Pisoni, and de Jong (2005) is available here.

References

Carter, A. K., & Clopper, C. G. (2002). Prosodic effects on word reduction. Language and Speech, 45, 321-353.

Clopper, C. G., Carter, A. K., Dillon, C. M., Hernandez, L. R., Pisoni, D. B., Clarke, C. M., Harnsberger, J. D., & Herman, R. (2002). The Indiana Speech Project: An overview of the development of a multi-talker multi-dialect speech corpus. Research on Spoken Language Processing Progress Report No. 25 (pp. 367-380). Bloomington, IN: Speech Research Laboratory, Indiana University.

Clopper, C. G., & Pisoni, D. B. (2006). The Nationwide Speech Project: A new corpus of American English dialects. Speech Communication, 48, 633-644.

Clopper, C. G., Pisoni, D. B., & de Jong, K. (2005). Acoustic characteristics of the vowel systems of six regional varieties of American English. Journal of the Acoustical Society of America, 118, 1661-1676.

Fairbanks, G. (1940). Voice and Articulation Drillbook. New York: Harper.

Kalikow, D. N., Stevens, K. N., & Elliott, L. L. (1977). Development of a test of speech intelligibility in noise using sentence materials with controlled word predictability. Journal of the Acoustical Society of America, 61, 1337-1351.

Labov, W., Ash, S., & Boberg, C. (2006). Atlas of North American English. Berlin: Mouton de Gruyter.

Stockwell, P. (2002). Sociolinguistics: A Resource Book for Students. London: Routledge.

Contact Information
For more information about the NSP or to obtain materials from the NSP corpus, please contact Cynthia Clopper (clopper.1 AT osu.edu).

Age	18-25 years old
Native Language	English
Mother's Native Language	English
Father's Native Language	English
History of Hearing or Speech Disorder	None
Race/Ethnicity	White

Materials Set	Examples
hVd Words (N=10)	heed, hid, head
CVC Words (N=76)	mice, dome, bait
Multisyllabic Words (N=112) (from Carter & Clopper, 2002)	alfalfa, nectarine
High Predictability Sentences (N=102) (from Kalikow, Stevens, & Elliott, 1977)	Ruth had a necklace of glass beads. The swimmer dove into the pool.
Low Predictability Sentences (N=52) (from Kalikow, Stevens, & Elliott, 1977)	Tom has been discussing the beads. She might consider the pool.
Anomalous Sentences (N=52) (see Clopper et al., 2002)	Bill knew a can of maple beads. The jar swept up the pool.
Passages (N=2)	Rainbow Passage (Fairbanks, 1940) Goldilocks Passage (Stockwell, 2002)
Interview Speech (5 minutes)	hometown, hobbies, travel experiences
Targeted Interview Speech (N=10 target words)	sleep, shoes, math