Exercise 11.4

Normally a human must transcribe the words and phrases spoken by a caller in order to calculate the word error rate. Explain how it may be possible to use a second speech recognition system to replace human transcriber. Specify a high-level architecture of a system for automatically estimating the word-error rate.

The usual approach for calculating the word error rate is to have a human annotator listen to the audio file and transcribe it into text. The annotator’s text is then compared to the ASR’s text, the differences noted, and the word error rate calculated.

It is possible to replace the human annotator by a powerful speech recognition engine (different form the production ASR) Use this text from this powerful speech recognition engine to estimate the word error rate. Because the second ASR is not perfect, it will itself produce errors, but because it works offline, it’s errors should be fewer than the original ASR

