How to subtitle videos and livestreams successfully

Top view of toys typewriter and gold alphabet beads on grey background written with Subtitles.

23/02/2024

The subtitling of videos can be done manually and therefore laboriously, or automatically. Artificial intelligence (AI) is used for the automated translation of speech into words. Depending on the system, AI is up to 70 times faster than a human. While a human already corrects errors line by line when creating texts, it may be necessary to carry out another manual correction when using AI.

How automatic subtitling works

Automatic subtitling works on a process called speech-to-text. Put simply, it works like this: the spoken word is analyzed and a kind of "audio footprint" is created. Only one word is assigned to this pattern. Many stored patterns make up the system's vocabulary. During recognition, the system checks which recognized audio patterns match those in the vocabulary. It also takes into account the logic that certain words are accompanied by certain words. An article is often followed by a noun. It would be rather wrong to attach an article to an article. In order for these systems to achieve optimum results, they must be trained. This is done by inputting corrected material.

The work steps

1. upload video, select language, have subtitles created
2. edit the text of the subtitles with an online editor if necessary
3. embed video with subtitles so that they are displayed immediately

Subtitling target group

Disabled people
In the context of accessibility, viewers with disabilities are entitled to subtitles. Public bodies are even obliged to make their content accessible so that people with disabilities are not deprived of information. In public media, the majority of content is now subtitled.

Mobile users
Mobile users are often on the move with the sound switched off, so subtitles help to convey the content to hearing users.

Limits of automatic subtitling

Music
Background noise is disruptive when recognizing speech. Recognition is almost impossible with music as background noise. The recognition rate of the translation from speech to text can be close to zero here, and the quality of the subtitles can suffer significantly.

Dialects
Dialects should be seen as languages in their own right. And even within the dialects, there are numerous variants that would make recognition difficult. As the creation of language models is very time-consuming, the industry has concentrated on the world's most important languages, 192 in number. This is also one reason why English, with around 900 million English speakers, offers better recognition than German, with around 130 million German speakers.

Have the first video subtitled

Click here for the free 14-day trial

How to subtitle videos and livestreams successfully

Share post