Timestamps
Time markers embedded in a transcript that indicate exactly when each word, phrase, or segment was spoken in the original audio.
Timestamps are time markers that anchor transcript text to specific moments in the source audio. They answer a simple but essential question: when was this said? Depending on the level of granularity, timestamps can mark the start of each subtitle block, each sentence, or even each individual word.
Types of timestamps
Segment-level timestamps mark the beginning and end of each caption block or paragraph. These are what you see in SRT and VTT subtitle files, and they are sufficient for most captioning and subtitle workflows.
Word-level timestamps assign a precise time to every single word in the transcript. This finer granularity is valuable for audio editing, karaoke-style highlighting, detailed linguistic research, and building interactive transcript experiences where users can click a word and jump to that exact moment in the recording.
Why timestamps matter
Without timestamps, a transcript is just a document. With them, it becomes a navigable map of the audio. Journalists can jump to the exact moment a source made a key statement. Researchers can locate specific passages in hours-long recordings. Content creators can align captions frame-accurately to video. Legal professionals can reference precise moments in recorded proceedings.
Timestamps and African language transcription
Accurate timestamping depends on the ASR system correctly identifying word boundaries, which can be tricky in tonal languages where syllable timing and prosodic patterns differ from English. Languages like Yoruba or Zulu have rhythmic structures that require models specifically tuned to their phonetic properties in order to place timestamps precisely.
AuTrans generates both segment-level and word-level timestamps for supported languages. These timestamps are preserved across export formats, whether users download their transcript as SRT, VTT, or plain text with inline time codes. The result is a transcript that stays tightly synchronised with the original audio.
Related
Transcription vs Translation
Transcription converts speech to text in the same language, while translation converts meaning from one language to another.
VTT (WebVTT Format)
A W3C standard subtitle format designed for the web, supporting timed text with optional styling, positioning, and metadata.
Word Error Rate (WER)
A standard metric for evaluating ASR accuracy by measuring the percentage of words incorrectly transcribed through substitutions, insertions, and deletions.
Start transcribing free
Get 30 minutes of free transcription every month. No credit card required. Just upload your audio and go.
Get Started Free