Voice Onset Time (VOT)

When we talk about the characteristics of plosives, we frequently use the term VOT. It refers to the period of time between the release of the oral closure and the vibration of the vocal folds. We measure the time in miliseconds or msec in short. Let's say we pronounce [pa] and it takes about 100 msec from the moment we open our lips to the moment the vocal cords start to vibrate. If we take the point of releasing our lips as a reference point, we say the VOT is +100. If the vocal cords start to vibrate before we open our lips, the value of VOT can be negative.

For example, let's suppose that the average VOT value for English aspirated voiceless labial plosive [p] is about +80 ~ +100 (these values as well as others in this page are purely imaginary). That means, if you pronounce [pa], it usually takes that amount of time from the moment of release of the oral closure to the start of vocal cord vibration. Suppose, again, that the average VOT for English [b] is around +60 ~ +80. If the average VOT for Korean [g] is somewhere around +60 ~ +90, resulting in a partial overlap with the VOT for English [p], Korean [b] could be recognized as English [p] as well as English [b] depending on a particular phonological environment. If the Korean[b] is located word initially, the VOT value is almost around 90 msec, the value for the English [p].

When we say 'voiced' or 'voiceless', we usually don't care about the exact VOT values of a sound. In phonetics, however, the terms 'voicing' and 'aspiration' have much to do with VOT. It might be said that the acoustic representation of both 'voicing' and 'aspiration' is VOT. Vocing and aspiration are related with psychological aspect of a language. That's why a native English thinks he heard a voiceless [p] when a Korean thinks he said voiced [b]. There is a city with a name starting with voiced (as we Koreans think) bilabial stop [b]. But when we say the name to people whose native tongue is English, they say without exception that they heard a word starting with voiceless bilabial stop, that is [p].

VOT Continuum

As you see in the diagram, the upper limit of the VOT value of Korean /b/ sound extends toward that of English aspirated /p/. It accounts for the misunderstanding between Korean and English speakers. Two sounds broadly described as the same 'voiced' or 'voiceless' sounds are more likely phonetically different from language to language as you saw in the above. But in phonology, we don't usually pay much attention to the subtle acoustic differences because phonologists are more concerned with mental or psychological perception rather than the strict phonetic difference.