Capturing Digital Video

The digitisation of the analogue TV signal is performed by a video capture card which converts each frame into a series of bitmapped images to be displayed and manipulated on the PC. This takes one horizontal line at a time and, for the PAL system, splits each into 768 sections. At each of these sections, the red, green and blue values of the signal are calculated, resulting in 768 coloured pixels per line. The 768 pixel width arises out of the 4:3 aspect ratio of a TV picture. Out of the 625 lines in a PAL signal, about 50 are used for Teletext and contain no picture information, so they’re not digitised. To get the 4:3 ratio, 575 lines times four divided by three gives 766.7. Since computers prefer to work with whole numbers, video capture cards usually digitise 576 lines, splitting each line into 768 segments, which gives an exact 4:3 ratio.

Thus, after digitisation, a full frame is made up of 768×576 pixels. Each pixel requires three bytes for storing the red, green and blue components of its colour (for 24-bit colour). Each frame therefore requires 768x576x3 bytes = 1.3MB. In fact, the PAL system takes two passes to draw a complete frame – each pass resolving alternate scan lines. The upshot is that one second of video requires a massive 32.5MB (1.3 x 25 fps). Adding a 16-bit audio track sampled at 44kHz increases this by a further 600KB per second. In practice, however, some boards digitise fewer than 576 lines and end up with less information, and most boards make use of the YUV scheme.

Scientists have discovered that the eye is more susceptible to brightness than it is to colour. The YUV model is a method of encoding pictures used in television broadcasting in which intensity is processed independently from colour. Y is for intensity and is measured in full resolution, while U and V are for colour difference signals and are measured at either half resolution (known as YUV 4:2:2) or at quarter resolution (known as YUV 4:1:1). Digitising a YUV signal instead of an RGB signal requires 16 bits (two bytes) instead of 24 bits (three bytes) to represent true colour, so one second of PAL video ends up requiring about 22MB.

The NTSC system used by America and Japan has 525 lines and runs at 30 fps – the latter being a consequence of the fact that their electric current alternates at 60Hz rather than the 50Hz found in Europe. NTSC frames are usually digitised at 640×480, which fits exactly into VGA resolution. This is not a co-incidence, but is a result of the PC having been designed in the US and the first IBM PCs having the capability to be plugged into a TV.

A typical video capture card is a system of hardware and software which together allow a user to convert video into a computer-readable format by digitising video sequences to uncompressed or, more normally, compressed data files. Uncompressed PAL video is an unwieldy beast, so some kind of compression has to be employed to make it more manageable. It’s down to a codec to compress video during capture and decompress it again for playback, and this can be done in software or hardware. Even in the age of GHz-speed CPUs, a hardware codec is necessary to achieve anything near broadcast quality video.

Most video capture devices employ a hardware Motion-JPEG codec, which uses JPEG compression on each frame to achieve smaller file sizes, while retaining editing capabilities. The huge success of DV-based camcorders in the late 1990s has led to some higher-end cards employing a DV codec instead.

Once compressed, the video sequences can then be edited on the PC using appropriate video editing software and output in S-VHS quality to a VCR, television, camcorder or computer monitor. The higher the quality of the video input and the higher the PC’s data transfer rate, the better the quality of the video image output.

Some video capture cards keep their price down by omitting their own recording hardware. Instead they provide pass through connectors that allow audio input to be directed to the host PC’s sound card. This isn’t a problem for simple editing work, but without dedicated audio hardware problems can arise in synchronising the audio and video tracks on longer and more complex edits.


Video capture cards are equipped with a number of input and output connectors. There are two main video formats: composite video is the standard for most domestic video equipment, although higher quality equipment often uses the S-Video format. Most capture cards will provide at least one input socket that can accept either type of video signal, allowing connection to any video source (e.g. VCR, video camera, TV tuner and laser disc) that generates a signal in either of these formats. Additional sockets can be of benefit though, since complex editing work often requires two or more inputs. Some cards are designed to take an optional TV tuner module and, increasing, video capture cards actually include an integrated TV tuner.

Video output sockets are provided to allow video sequences to be recorded back to tape and some cards also allow video to be played back either on a computer monitor or TV. Less sophisticated cards require a separate graphics adapter or TV tuner card to provide this functionality.