Word Error Rate (WER)

A standard metric for evaluating ASR accuracy by measuring the percentage of words incorrectly transcribed through substitutions, insertions, and deletions.

Word Error Rate, or WER, is the most widely used metric for evaluating how accurate a speech recognition system is. It gives you a single number that represents the percentage of words the system got wrong compared to a human-verified reference transcript.

How WER is calculated

WER accounts for three types of errors: substitutions (the system wrote "cat" when the speaker said "car"), deletions (a word was spoken but the system missed it entirely), and insertions (the system added a word that was never spoken). The formula is straightforward:

WER = (Substitutions + Deletions + Insertions) / Total Words in Reference

A WER of zero percent means the transcript is perfect. A WER of ten percent means roughly one in every ten words contains an error. It is possible for WER to exceed one hundred percent if the system inserts many extra words.

What counts as "good" WER

For well-resourced languages like English, leading ASR systems achieve WERs between three and five percent on clean audio, approaching human-level performance. For African languages, WERs tend to be higher due to less training data, greater acoustic diversity, and the prevalence of code-switching. A WER of fifteen to twenty percent for a lower-resourced language may actually represent strong performance given the constraints.

Limitations of WER

WER treats all errors equally, but not all errors are equal in practice. Misrecognising a person's name is far more consequential than dropping a filler word like "um." WER also does not account for whether the overall meaning of a sentence was preserved. For these reasons, WER is best used as one indicator among several, not as the sole measure of transcription quality.

AuTrans uses WER alongside human review benchmarks to continuously improve transcription accuracy across every supported language.

Start transcribing free

Get 30 minutes of free transcription to start. No credit card required. Just upload your audio and go.

Get Started Free

Word Error Rate (WER)

How WER is calculated

What counts as "good" WER

Limitations of WER

Related

Accent Adaptation

AI Summarization

ASR (Automatic Speech Recognition)

Start transcribing free