Text superimposed on video content is everywhere: YouTube even offers to caption videos by transcribing the audio to text and write it synchronously over the video. Most video sources (including DVDs) have an option to show subtitles any one of a number of languages. You can find the distinction between subtitles and closed captioning here. Closed captions are optional – you can view them or not, whereas subtitles are part of the video.
Videos Have More information Than Images
If text over video is so common, why is it difficult to do? The short answer is that videos contain much more information than images. Placing text in a video requires re-creation of all the content. This requires much more computation than re-creating a single image. However, showing a caption only requires storing the text, the time it is to be shown and the location on the screen. This information can be easily encoded in a file and rendered by the video player software. This is how most text on video is displayed. One problem is that different video players use different file formats for the data. Another is that the file is separate from the video file. If you download a video that has been auto-captioned from YouTube using a 3rd party application any auto-captions will not be included.
Web-Based Facilities
Web-based facilities for displaying remotely stored video files, such as YouTube, can ensure that all videos are displayed using a video player that supports the display of separately stored captions. However, the separation of the captions from the video file means that the caption data is not easily available, or not available at all. YouTube has a takeout facility. This allows users to download all their YouTube video content, including a JSON metadata file for each video file. This file includes many metadata fields for videos but does not include the caption data.
Using a separate file for text and video is great for flexibility, but does require that the file be kept along with the video content. A further problem is that not all video players support all the available caption file formats. Perhaps some future video format will allow incorporation of text captions as metadata of the main video file. However, future video players must be able to read it! The default Windows 10 video player, Photos, supports a number of caption file formats and there are many online facilities for generating them, some of which are reviewed here.
Making Captions readable on any Player
So if you want to ensure that you can caption videos so that the captions are readable on any video player and are embedded in the video data, what are the options? If you want to keep the entire original video frame and place the caption beneath it, then the video needs to be padded out with a uniform colour bar below the video frame. This can be done using the Windows command line application ffmpeg. A complexity of this operation is that portrait mode videos from smart phones may be padded with black at the sides to make the video frame the same dimensions as in landscape mode.
Copying from Analogue Media
Videos copied from analog media such as Hi 8 or VHS cassette tapes may have similar padding added. The caption can then be written on the uniform colour bar or on top of the video, either using a web-based service such as Kapwing Subtitler or a desktop video editor such as Photos for Windows 10. Photos does not offer the flexibility of caption font, colour, and position selection offered by Kapwing Subtitler, but it is simple to use and available as part of Windows 10. Using a desktop application is likely to be much slower than using a cloud-based facility. These can apply more computing resources than are available on the average domestic computer to the task of reading video frames, adding text and re-encoding the video.