Word Error Rate (WER)

A standard metric for evaluating ASR accuracy by measuring the percentage of words incorrectly transcribed through substitutions, insertions, and deletions.

Word Error Rate, or WER, is the most widely used metric for evaluating how accurate a speech recognition system is. It gives you a single number that represents the percentage of words the system got wrong compared to a human-verified reference transcript.

How WER is calculated

WER accounts for three types of errors: substitutions (the system wrote "cat" when the speaker said "car"), deletions (a word was spoken but the system missed it entirely), and insertions (the system added a word that was never spoken). The formula is straightforward:

WER = (Substitutions + Deletions + Insertions) / Total Words in Reference

A WER of zero percent means the transcript is perfect. A WER of ten percent means roughly one in every ten words contains an error. It is possible for WER to exceed one hundred percent if the system inserts many extra words.

What counts as "good" WER

For well-resourced languages like English, leading ASR systems achieve WERs between three and five percent on clean audio, approaching human-level performance. For African languages, WERs tend to be higher due to less training data, greater acoustic diversity, and the prevalence of code-switching. A WER of fifteen to twenty percent for a lower-resourced language may actually represent strong performance given the constraints.

Limitations of WER

WER treats all errors equally, but not all errors are equal in practice. Misrecognising a person's name is far more consequential than dropping a filler word like "um." WER also does not account for whether the overall meaning of a sentence was preserved. For these reasons, WER is best used as one indicator among several, not as the sole measure of transcription quality.

AuTrans uses WER alongside human review benchmarks to continuously improve transcription accuracy across every supported language.

Related

Start transcribing free

Get 30 minutes of free transcription every month. No credit card required. Just upload your audio and go.

Get Started Free