Digital Video Compression

Video compression is the art of throwing as much data away as possible without it showing. Video compression methods tend to be lossy – that is, what comes out after decoding isn’t identical to what was originally encoded. By cutting video’s resolution, colour depth and frame rate, PCs managed postage stamp-size windows at first, but then ways were devised to represent images more efficiently and reduce data without affecting physical dimensions. The technology by which video compression is achieved is known as a codec, an abbreviation of compression/decompression. Various types of codec have been developed – implementable in either software and hardware, and sometimes utilising both – allowing video to be readily translated to and from its compressed state.

Lossy techniques reduce data – both through complex mathematical encryption and through selective intentional shedding of visual information that our eyes and brain usually ignore – and can lead to perceptible loss of picture quality. Lossless compression, by contrast, discards only redundant information. Codecs can be implemented in hardware or software, or a combination of both. They have compression ratios ranging from a gentle 2:1 to an aggressive 100:1, making it feasible to deal with huge amounts of video data. The higher the compression ratio, the worse the resulting image. Colour fidelity fades, artefacts and noise appear in the picture, the edges of objects become over-apparent, until eventually the result is unwatchable.

By the end of the 1990s, the dominant techniques were based on a three-stage algorithm known as DCT (Discrete Cosine Transform). DCT uses the fact that adjacent pixels in a picture – either physically close in the image (spatial) or in successive images (temporal) – may be the same value. A mathematical transform – a relative of the Fourier transform – is performed on grids of 8×8 pixels (hence the blocks of visual artefacts at high compression levels). It doesn’t reduce data but the resulting coefficient frequency values are no longer equal in their information-carrying roles. Specifically, it’s been shown that for visual systems, the lower frequency components are more important than high frequency ones. A quantisation process weights these accordingly and ejects those contributing least visual information, depending on the compression level required. For instance, losing 50 per cent of the transformed data may only result in a loss of five per cent of the visual information. Then entropy encoding – a lossless technique – jettisons any truly unnecessary bits.

Discrete

Initially, compression was performed by software. Limited CPU power constrained how clever an algorithm could be to perform its task in a 25th of a second – the time needed to draw a frame of full-motion video. Nevertheless, Avid Technology and other pioneers of NLE (non-linear editing) introduced PC-based editing systems at the end of the 1980s using software compression. Although the video was a quarter of the resolution of broadcast TV, with washed-out colour and thick with blocky artefacts, NLE signalled a revolution in production techniques. At first it was used for off-line editing, when material is trimmed down for a programme. Up to 30 hours of video may be shot for a one-hour documentary, so it’s best to prepare it on cheap, non-broadcast equipment to save time in an on-line edit suite.

Although the quality of video offered by the first PC-based NLE systems was worse than the VHS VCRs used for off-line editing, there were some advantages. Like a word processor for video, they offered a faster and more creative way of working. A user could quickly cut and paste sections of video, trim them and make the many fine-tuning edits typical of the production process. What’s more, importing an accurate EDL (edit decision list) generated by an NLE system into the on-line computer on a floppy disk was far better than having to type in a list of time-codes. Not only was NLE a better way to edit but, by delivering an off-line product closer to the final programme, less time was needed in the on-line edit suite.

NLE systems really took off in 1991, however, when hardware-assisted compression brought VHS-quality video. The first hardware-assisted video compression is known as M-JPEG (motion JPEG). It’s a derivation of the DCT standard developed for still images known as JPEG. It was never intended for video compression, but when C-Cube introduced a codec chip in the early 1990s that could JPEG as many as 30 still images a second, NLE pioneers couldn’t resist. By squeezing data as much as 50 times, VHS-quality digital video could be handled by PCs.

In time, PCs got faster and storage got cheaper, meaning less compression had to be used so that better video could be edited. By compressing video by as little as 10:1 a new breed of non-linear solutions emerged in the mid-1990s. These systems were declared ready for on-line editing; that is, finished programmes could essentially be played out of the back of the box. Their video was at least considered to be of broadcast quality for the sort of time and cost-critical applications that most benefited from NLE, such as news, current affairs and low-budget productions.

The introduction of this technology proved controversial. Most images compressed cleanly at 10:1, but certain material – such as that with a lot of detail and areas of high contrast – were degraded. Few viewers would ever notice, but broadcast engineers quickly learnt to spot the so-called ringing and blocky artefacts DCT compression produced. Also, in order to change the contents of the video images, to add an effect or graphic, material must first be decompressed and then recompressed. This process, though digital, is akin to an analogue generation. Artefacts are added like noise with each cycle in a process referred to as concatenation. Sensibly designed systems render every effect in a single pass, but if several compressed systems are used in a production and broadcast environment, concatenation presents a problem.

Compression technology arrived just as proprietary uncompressed digital video equipment had filtered into all areas of broadcasters and video facilities. Though the cost savings of the former were significant, the associated degradation in quality meant that acceptance by the engineering community was slow at first. However, as compression levels dropped – to under 5:1 – objections began to evaporate and even the most exacting engineer conceded that such video was comparable to the widely used BetaSP analogue tape. Mild compression enabled Sony to build its successful Digital Betacam format video recorder, which is now considered a gold standard. With compression a little over 2:1, so few artefacts (if any) are introduced that video goes in and out for dozens of generations apparently untouched.

The cost of M-JPEG hardware has fallen steeply in the past few years and reasonably priced PCI cards capable of a 3:1 compression ratio and bundled with NLE software are now readily available. Useful as M-JPEG is, it wasn’t designed for moving pictures. When it comes to digital distribution, where bandwidth is at a premium, the MPEG family of standards – specifically designed for video – offer significant advantages.