Best practice

Captions

  • Provide captions for pre-recorded and live video with audio
  • Use <track> element to specify timed text tracks for <audio> or <video> elements.
  • Captions are synchronized with the audio.
  • Captions are typed in mixed case letters.
  • Captions use no more than three lines at a time.
  • Put a new sentence on a new line.
  • Maximum number of characters per line is 32 characters.
  • Insert caption line breaks at logical points rather than in the middle of a phrase.
  • Default colors are white text on a black background.
  • Default color contrast ratio between font color and background color is a minimum of 3:1 (font size at least 18 points).
  • Default font size is at least 22pt.
  • Position captions to not obscure on-screen text, people’s faces and other important visual information.
  • Ensure a minimum of 1.5 seconds gap in between captions.
  • Remove captions from long silent intervals. Captions have a maximum duration of 6 seconds.

Transcripts

  • Basic transcripts are a text version of the speech and non-speech audio information.
  • Descriptive transcripts also include text description of the visual information.
  • Provide descriptive transcripts for pre-recorded video with audio
  • Provide descriptive transcripts or audio description for pre-recorded video-only
  • Provide basic transcript for pre-recorded audio-only
  • Provide basic transcript or captions for live audio-only
  • Interactive transcripts enable a user to click a phrase anywhere in the transcript to navigate to that exact point in the video (or audio). Interactive transcripts are built from timed text files specified in the <track> element.
  • Position the transcript or a link to it directly below or adjacent to the media player.
  • If the transcript is on another page, provide a link back to the audio or video file.
  • Provide the transcript in HTML for maximum accessibility to people and to search engines.
  • If working with a captions file, combine several lines into sensible paragraphs.

Transcribing audio to text

  • Transcription best practice is nearly identical for captions and transcripts.
  • When transcribing, the goal is accuracy:
    • Never paraphrase or omit words (and do not censor).
    • Never substitute words.
    • Never rearrange the order of speech.
    • Never correct or edit a speaker’s grammar.
    • Never provide clarifying information in the captions (you may in the transcript).
  • Transcribe all speech and non-speech sounds (laughs, groans, sighs, screams, car backfiring, footsteps approaching, distant roaring)
  • Identify the speakers. Use the full name the first time and single name otherwise. If a speaker is not identified, use Speaker + number (e.g, Speaker 1, Speaker 2) or use a role/title without a number (e.g., interviewer, Doctor)
  • Exclude non-relevant speech and non-relevant background noise.
  • Do not reveal intentionally held information before the appropriate time.
  • Include relevant information about the speech, e.g., (whispering), (mouthing).
  • Put non-speech sounds in parenthesis, italics, lowercase, and with a space before and after, e.g., ( chatter in distance )
  • Use punctuation to convey emphasis.
  • For interrupted speech, use a dash at the end of the line.
  • Use all capital letters only to indicate yelling.
  • When the speech is unintelligible or inaudible, transcribe [inaudible]
  • Indicate large silences as (silence).
  • Include background music if it's important to understand the content:
    • Identify music with the uppercase label MUSIC (or a verb implying music), followed by a colon and the title in quotation marks followed by the artist.
  • Transcribe important lyrics with musical notes to either side, e.g.,

    ♪ A long, long time ago ♪

  • Describe music that’s not part of the action but sets the mood, e.g.,

    ♪ scary music ♪

  • Best practices unique to transcripts:
    • For descriptive transcripts, include all relevant audio information as well as description of all relevant visual information.
    • If your transcript is generated from timed text files ensure descriptions fit into gaps in the main audio, or use a player that can pause the video during the description.
    • Transcripts include onscreen text in videos. Captions do not include onscreen text.
    • Ensure transcripts identify the source of sounds, rather than just describing them.
    • In some cases, such as legal depositions, the transcript must be verbatim, including ums, ahs, and indicating pauses.
    • Headings, topics and links can make the transcript more usable.
    • Include timestamps only when useful.
    • Add a timestamp to inaudible audio.

Description of visual information

  • Provide audio description for pre-recorded video with audio.
  • Provide audio description or a descriptive transcript for video-only
  • Design new videos with integrated descriptions (script includes all relevant visual information) to avoid the need for audio description
  • Make sure the important visual elements are described appropriately and objectively to understand what the video is communicating.
  • Write description of visual information in present tense, using an active voice and a third-person narrative style.
  • Make sure to include all text, e.g., title text at the beginning, links and email addresses, speaker’s names, and text in a presentation.

Media player accessibility

The ideal media player provides built-in support for captions, audio descriptions, and transcripts.

Keyboard accessibility:

  • All controls can receive focus via the tab key.
  • Controls have a visible keyboard focus indicator.
  • The tab order of controls matches the visual order, left to right.
  • All controls are operable by keyboard.
  • Text, controls, and backgrounds have sufficient contrast between colors.

Screen reader accessibility:

  • Each control presents to screen readers its name and role, and value if one or more is set.

Flashing content

Ensure flashing content:

  • Does not flash more than 3 times per second.
  • Is not larger than 21,824 sq pixels.
  • Does not have high contrast.

Assess flashing content using a tool such as the Photosensitive Epilepsy Analysis Tool (PEAT).

Animation and motion

  • Allow users to turn off motion animations.
  • Avoid using unnecessary animations.

Pause, stop or hide

  • For any moving, blinking and scrolling information that starts automatically, lasts more than five seconds, and is presented in parallel with other content, provide the user a way to pause, stop or hide it.
  • For auto-updating information, provide a way for the user to pause, stop or hide the content. Or, provide a way for the user to control the frequency of the update.
  • A keyboard accessible “pause button” or other mechanisms can be used to pause the content.
  • Avoid unnecessary moving, blinking, scrolling or auto-updating content.

Audio control

If audio plays automatically on page load for more than 3 seconds, enable the user to:

  • pause or stop the audio, or
  • control the volume independent of the system volume.

Alternately, play sounds only on user request.

Back to top