Plain-English definitions of audio transcription terms — ASR, speaker diarization, subtitle formats, and more.
The ability of a speech recognition system to adjust its models to accurately recognise speech from speakers with diverse regional or linguistic accents.
The use of artificial intelligence to automatically generate concise summaries from longer texts, such as full transcripts of audio recordings.
Technology that converts spoken language into written text using machine learning models trained on audio and language data.
The practice of alternating between two or more languages or dialects within a single conversation, sentence, or even phrase.
A widely spoken English-based creole in Nigeria used by over 75 million people as a lingua franca across ethnic and linguistic boundaries.
Real-time transcription processes audio as it is being spoken, while batch transcription processes pre-recorded audio files after the fact.
The process of partitioning an audio recording into segments based on who is speaking, answering the question 'who spoke when.'
A widely used subtitle file format that stores timed text as numbered blocks with start and end timestamps and corresponding text.
Time markers embedded in a transcript that indicate exactly when each word, phrase, or segment was spoken in the original audio.
Transcription converts speech to text in the same language, while translation converts meaning from one language to another.
A W3C standard subtitle format designed for the web, supporting timed text with optional styling, positioning, and metadata.
A standard metric for evaluating ASR accuracy by measuring the percentage of words incorrectly transcribed through substitutions, insertions, and deletions.