Best practice

Captions

Provide captions for pre-recorded and live video with audio
Use <track> element to specify timed text tracks for <audio> or <video> elements.
Captions are synchronized with the audio.
Captions are typed in mixed case letters.
Captions use no more than three lines at a time.
Put a new sentence on a new line.
Maximum number of characters per line is 32 characters.
Insert caption line breaks at logical points rather than in the middle of a phrase.
Default colors are white text on a black background.
Default color contrast ratio between font color and background color is a minimum of 3:1 (font size at least 18 points).
Default font size is at least 22pt.
Position captions to not obscure on-screen text, people’s faces and other important visual information.
Ensure a minimum of 1.5 seconds gap in between captions.
Remove captions from long silent intervals. Captions have a maximum duration of 6 seconds.

Transcripts

Basic transcripts are a text version of the speech and non-speech audio information.
Descriptive transcripts also include text description of the visual information.
Provide descriptive transcripts for pre-recorded video with audio
Provide descriptive transcripts or audio description for pre-recorded video-only
Provide basic transcript for pre-recorded audio-only
Provide basic transcript or captions for live audio-only
Interactive transcripts enable a user to click a phrase anywhere in the transcript to navigate to that exact point in the video (or audio). Interactive transcripts are built from timed text files specified in the <track> element.
Position the transcript or a link to it directly below or adjacent to the media player.
If the transcript is on another page, provide a link back to the audio or video file.
Provide the transcript in HTML for maximum accessibility to people and to search engines.
If working with a captions file, combine several lines into sensible paragraphs.

Transcribing audio to text

Transcription best practice is nearly identical for captions and transcripts.
When transcribing, the goal is accuracy:
- Never paraphrase or omit words (and do not censor).
- Never substitute words.
- Never rearrange the order of speech.
- Never correct or edit a speaker’s grammar.
- Never provide clarifying information in the captions (you may in the transcript).
Transcribe all speech and non-speech sounds (laughs, groans, sighs, screams, car backfiring, footsteps approaching, distant roaring)
Identify the speakers. Use the full name the first time and single name otherwise. If a speaker is not identified, use Speaker + number (e.g, Speaker 1, Speaker 2) or use a role/title without a number (e.g., interviewer, Doctor)
Exclude non-relevant speech and non-relevant background noise.
Do not reveal intentionally held information before the appropriate time.
Include relevant information about the speech, e.g., (whispering), (mouthing).
Put non-speech sounds in parenthesis, italics, lowercase, and with a space before and after, e.g., ( chatter in distance )
Use punctuation to convey emphasis.
For interrupted speech, use a dash at the end of the line.
Use all capital letters only to indicate yelling.
When the speech is unintelligible or inaudible, transcribe [inaudible]
Indicate large silences as (silence).
Include background music if it's important to understand the content:
- Identify music with the uppercase label MUSIC (or a verb implying music), followed by a colon and the title in quotation marks followed by the artist.
Transcribe important lyrics with musical notes to either side, e.g.,

♪ A long, long time ago ♪
Describe music that’s not part of the action but sets the mood, e.g.,

♪ scary music ♪
Best practices unique to transcripts:
- For descriptive transcripts, include all relevant audio information as well as description of all relevant visual information.
- If your transcript is generated from timed text files ensure descriptions fit into gaps in the main audio, or use a player that can pause the video during the description.
- Transcripts include onscreen text in videos. Captions do not include onscreen text.
- Ensure transcripts identify the source of sounds, rather than just describing them.
- In some cases, such as legal depositions, the transcript must be verbatim, including ums, ahs, and indicating pauses.
- Headings, topics and links can make the transcript more usable.
- Include timestamps only when useful.
- Add a timestamp to inaudible audio.

Description of visual information

Provide audio description for pre-recorded video with audio.
Provide audio description or a descriptive transcript for video-only
Design new videos with integrated descriptions (script includes all relevant visual information) to avoid the need for audio description
Make sure the important visual elements are described appropriately and objectively to understand what the video is communicating.
Write description of visual information in present tense, using an active voice and a third-person narrative style.
Make sure to include all text, e.g., title text at the beginning, links and email addresses, speaker’s names, and text in a presentation.

Media player accessibility

The ideal media player provides built-in support for captions, audio descriptions, and transcripts.

Keyboard accessibility:

All controls can receive focus via the tab key.
Controls have a visible keyboard focus indicator.
The tab order of controls matches the visual order, left to right.
All controls are operable by keyboard.
Text, controls, and backgrounds have sufficient contrast between colors.

Screen reader accessibility:

Each control presents to screen readers its name and role, and value if one or more is set.

Flashing content

Ensure flashing content:

Does not flash more than 3 times per second.
Is not larger than 21,824 sq pixels.
Does not have high contrast.

Assess flashing content using a tool such as the Photosensitive Epilepsy Analysis Tool (PEAT).

Animation and motion

Allow users to turn off motion animations.
Avoid using unnecessary animations.

Pause, stop or hide

For any moving, blinking and scrolling information that starts automatically, lasts more than five seconds, and is presented in parallel with other content, provide the user a way to pause, stop or hide it.
For auto-updating information, provide a way for the user to pause, stop or hide the content. Or, provide a way for the user to control the frequency of the update.
A keyboard accessible “pause button” or other mechanisms can be used to pause the content.
Avoid unnecessary moving, blinking, scrolling or auto-updating content.

Audio control

If audio plays automatically on page load for more than 3 seconds, enable the user to:

pause or stop the audio, or
control the volume independent of the system volume.

Alternately, play sounds only on user request.