Transcription: Mumbling, Background Noise and Other Pitfalls
January 28, 2013
Transcription sounds like a fairly straightforward task: listen to a recording and type out what you hear. Just how hard can it be? At Para-Plus, we know from experience that transcription takes a great set of ears, training and experience to produce accurate documents. Whether we are working on an English-only project or transcribing another language and translating it into English, there are several pitfalls that invariably crop up to make the task more difficult than it first appears to the casual observer.
The first of these pitfalls falls under the broad concept of speaker audibility. Unless each speaker has access to a microphone, such as in a courtroom, every speaker in a recording is not always going to be clear and audible. Frequently, speakers mumble or their voices are muffled, which the transcriber has to decipher into something that makes sense. When the transcriber cannot make out a word or phrase, our official term for this is “Unintelligible” which we mark as “[UI]” in the transcript. Some UIs are inevitable: when two speaker talk over each other, some or all of the words might be lost in the noise. If the recordings involve subjects who are unaware of being recorded, Murphy’s law holds that they are certainly not going to be speaking clearly and enunciating their words. Speaker A will be talking with his hand over his mouth for 15 minutes. Speaker B will be leaving a voicemail for someone while Speaker A talks to Speaker C.
Our goal, of course, is to have a minimal amount of UIs, if any, in a transcript. However, another problem that stands in the way of a UI-free transcript is background noise. This can either be from the recording itself or ambient noise. Constant background noise from the audio, such as a buzzing phone connection, can usually be corrected through our sound editing programs. But phone calls can have people shouting in the background. Undercover recordings have other noises going on in the background. These noises run the gamut – forks clattering, phones ringing, trucks backing up, traffic, music - and can take on the absurd (a bird screeching at five-second intervals). Recordings like this take more fortitude and skill to get through, but experienced transcribers know how to tune out the background just enough to pick out the words.
Another factor in the complexity of transcription is the number of speakers in the recording. Two speakers? No problem (just as long as their voices don’t sound the same). Things start to get interesting when you have four, six or even eight speakers in a recording. In these cases, a lot of time is devoted at the beginning of the project to determine and separate out each voice.
Finally, context matters in transcription. Sometimes the accuracy of a transcript has nothing to do with speaker audibility or background noise, but rather a mishearing of the words spoken in the recording. Technically, these are called “oronyms.” Oronyms are homophones, but instead of a pair words, it is a string of words or phrases. Consider: “stuffy nose” vs. “stuff he knows;” “four candles” vs. “fork handles.” If a transcriber hears a phrase incorrectly, it can lead to rather random sentences inserted in a transcription. This is a surprisingly common (and often amusing) issue that happens to even the most seasoned transcriber, which is why our transcription process usually includes a separate review.
Transcription essentially requires the transcriber to think like she is present in the conversation, picking out the conversation based on context if background noises or muffled voices make it necessary. Whether in a formal interview setting or in casual conversation, there’s a natural flow to word order and word choice that a good transcriber can recognize. Transcription turns out to be more than “just” typing the words, but more of a mental exercise for the ears and brain.