Ana səhifə

Executive Summary

Yüklə 2.11 Mb.
ölçüsü2.11 Mb.
Executive Summary
The purpose of this report is to inform the audience about video and audio compression standards developed by the Moving Picture Expert Group (MPEG). This report includes general information about different types of MPEG along with its history and development. The following MPEG standards exist:

  • MPEG-1, a standard for storage and retrieval of moving pictures and audio on storage media.

  • MPEG-2, a standard for digital television.

  • MPEG-4, a standard for multimedia applications.

  • MPEG-7, a content representation standard for information search

The acronym MPEG stands for Moving Picture Coding Experts Group, and was established in January 1988 with the mandate to develop standards for coded representation of moving pictures, audio and their combination. There were only 25 experts working together when the first MPEG meeting took place. Since then, MPEG has grown to a large committee consisting of 350 experts from 200 companies from about 20 countries. MPEG operates under ISO, the International Organization for Standardization and IEC, the International Electro-technical Commission. MPEG itself is a nickname; the official name is: ISO/IEC JTC1 SC29 WG11. The term also refers to the family of digital video compression standards and file formats developed by the group. MPEG generally produces better-quality video than competing formats, such as Video for Windows, Indeo and QuickTime. MPEG files can be decoded by special hardware or by software.
Presently MPEG audio/video compression is widely used in everyday life in many applications. For example, it is used in DVD players, digital television set-top boxes, HDTV recorders, Internet video, video conferencing, and many others.

MPEG achieves high compression rate by storing only the changes from one frame to another, instead of each entire frame. The video information is then encoded using a technique called DCT, Discrete Cosine Transform. MPEG uses a type of “lossy” compression, since some data is removed. But the diminishment of data is generally imperceptible to the human eye. (

As of today, there are three MPEG standards: MPEG-1, MPEG-2, and MPEG-4. MPEG-1 and MPEG-2 provide interoperable ways of representing audiovisual content, used on the air and on digital media. In other words, these standards made interactive video on CD-ROM, DVD and Digital Television possible. MPEG-1 is used for the storage and retrieval of moving pictures and audio on storage media. MPEG-2 is used for digital television; it's the timely response for the satellite broadcasting and cable television industries in their transition from analog to digital formats. MPEG-3 was intended as an extension of MPEG-2 to cater for HDTV but was eventually merged into MPEG-2: not to be confused with MP3 - MPEG-1 layer three: compression of audio files. MPEG-4 is the multimedia standard which considers the growing-together of the three areas TV, computer and communication. MPEG-4 codes content as objects and enables those objects to be manipulated individually or collectively on audio-visual scene.


In 1991, the Moving Picture Expert Group finalized its first standard for video and audio compression: MPEG-1. This generic standard is independent of a particular application; and thus is better described as mainly a toolbox. Since its’ creation, it was noticeable that MPEG-1’s quality was superior to other digital video format. This new video and audio encoding standard offered high compression techniques while maintaining moderate quality picture and audio on such popular formats as VideoCD’s (CD-ROM) and MP3’s. More importantly, this widely accepted standard became an international standard, which can be adopted universally by different industries ( In 1992, MPEG-1 was officially published by the ISO MPEG committee, and has ever since spawned further advancements in video and audio compressions (as can be seen in MPEG-2). In short, MPEG-1 is a standard in 5 parts:

  • Part 1: specifies the combination of video, audio, and data streams into

a single data stream

  • Part 2: defines video compression

  • Part 3: defines audio compression

  • Part 4: deals with compliance testing

  • Part 5: technically describes/documents Part 1, 2, and 3

Part one of the MPEG-1 standard deals with the problem of combining data streams of video and audio parts to form a single stream. This is an important function that allows a number of different types of inputs to combine into one compressed stream that is well suited for digital storage or transmission (Leonardo Chiriglione). Time stamps are used as the basic coding principle of a MPEG System. This system decodes and displays the times of audio, video, and the time of reception of the multiplexed coded data at the decoder, all in terms of 90 kHz system clock. This method benefits a lot in terms of flexibility in areas such as decoder design, packet lengths, video picture rates, audio sample rates, network performance, etc. (

Part 2 of MPEG-1 is known as MPEG-1 Video. This part presents various techniques that can be used to compress video. It mainly takes advantage of the vast number of redundancies found in series of video frames and the human eye’s inability to recognize subtle/discrete changes in between these redundancies. Specifically, part 1 states how to use techniques such as motion compensations and temporal predictions to remove redundant picture frames from a group of pictures, and thereby, reducing the overall size of the video. (A more in depth description of these techniques will be discussed in MPEG-2 Video Compression, since MPEG-2 is an extension of such techniques.)
MPEG-1 Audio is also an important part of the MPEG-1 standard. Here, the same video compression does not apply to audio. The process starts out by placing audio samples into two independent blocks of the encoder. The “mapping block” of the encoder filters and creates 32 equal-width frequency subbands, or samples, while the “psychoacoustic block” determines a masking threshold for the audio inputs. By determining such threshold, the psychoacoustic block can output information about noise that is imperceptible to the human ear. The “quantizer and coding” block then uses the sample and masking threshold information to create a set of coding symbols that the “frame packing” block needs. At the end, the “frame packing” block assembles the actual bitstream from the output data of the other blocks, and adds other header information as necessary before sending it out into that single bitstream. (Refer to the diagram)

Figure 1: Example of an Encoder

The MPEG committee also created three compression methods and named them as following: Audio Layer I, II, and III. Layer one is the simplest. It is a ‘sub-band coder’ with a psychoacoustic model that relies on little processing power to produce what is now obsolete and poor audio quality. Layer II has more advanced bit allocation techniques, and provides greater accuracy, but in today’s standard, is just “okay” standard sound. Layer III, on the other hand uses a massive amount of processing power, but is able to produce very high quality CD sound. Layer III is also known as Mp3’s, today’s most widely used audion compression technique. Important applications include of such standards include: Consumer Recording (DCC), Disc based storage (CD-i, CD-Video), DVD, Disc based Editing, audio broadcasting station automation (Lynezoo).

Part 4 of MPEG-1 verifies whether bitstreams and decoders meet the requirement that was established as essential in parts 1, 2, and 3 of the MPEG-1 standard. Specifically, these tests verify various characteristics of a bit stream, and can be used to make sure that an encoder produces valid bitstreams. In turn, manufacturers of encoders & their customers, or nearly any individual with the appropriate knowledge, can check and verify these parts.

Finally, part 5, while technically not a standard, is a reporting of the full software implementation of the first three parts of the MPEG-1 standard (Chiariglione).


The Moving Pictures Experts Group-2 (MPEG-2) is an industry standard of delivering digital video on a network. MPEG-2 substantially reduces the bandwidth required to transmit a high-quality digital video signal, and it optimizes the trade-offs between resolution and the required transmission bandwidth. All industries considering digital video service distribution have to make MPEG-2 part of the planning process; it is crucial in digital head-end, broadband distribution, network access equipment, and the associated architectures and operations.

The picture quality through an MPEG-2 codec depends on the complexity and predictability of the source pictures. Real-time coders and decoders have demonstrated generally good quality standard-definition pictures at bit rates around 6 Mbits per second. As MPEG-2 coding increases, the same picture quality may be achievable at lower bit rates.
Parts of MPEG-2

MPEG-2 is a standard currently in ten parts. One has been withdrawn due to the fact that there was no demand for it in the industry.

Part 1 of MPEG-2 addresses the combining of one or more elementary streams of video and audio, as well as, other data into single or multiple streams, which are suitable for storage or transmission.

Part 2 of MPEG-2 builds on the powerful video compression capabilities of the MPEG-1 standard to offer a wide range of coding tools. These have been grouped in profiles to offer different functionalities.

Part 3 of MPEG-2 is a backwards-compatible multi-channel extension of the MPEG-1 Audio standard.

Part 4 and 5 of MPEG-2 correspond to part 4 and 5 of MPEG-1. They have been finally approved in March 1996.

Part 6 of MPEG-2 - Digital Storage Media Command and Control (DSM-CC) is the specification of a set of protocols, which provides the control functions and operations specific to managing MPEG-1 and MPEG-2 bitstreams. These protocols may be used to support applications in both stand-alone and heterogeneous network environments. In the DSM-CC model, a stream is sourced by a Server and delivered to a Client.

Part 7 of MPEG-2 is the specification of a multi-channel audio coding algorithm not constrained to be backwards-compatible with MPEG-1 Audio. The standard has been approved in April 1997.

Part 8 of MPEG-2 was originally planned to be coding of video when input samples are 10 bits. Work on this part was discontinued when it became apparent that there was insufficient interest from industry for such a standard.

Part 9 of MPEG-2 is the specification of the Real-time Interface (RTI) to Transport Stream decoders, which may be utilized for adaptation to all appropriate networks carrying Transport Streams.

Part 10 of MPEG-2 is the conformance testing part of DSM-CC, and it is still under development. (

 MPEG-2Video Compression Overview

All video files must be made smaller to allow them to play back in normal multimedia environments. Video files are huge: a two-hour program would require 90 Gigabytes of space, while the storage capacity of DVD ranges from 4.7 to 17 Gigabytes. Moreover, without compression, it makes it impossible to send and even play such files. “The data-rate of uncompressed “studio quality” digital video is upwards of 100 Megabits per second, which exceeds the speed at which a DVD player can retrieve video information.” ( Video compression is a break-through that allows us to operate motion picture files. MPEG-2 Video Compression strikes to be the best standard there is available. “MPEG-2 is universally regarded as yielding higher image quality, and is the norm for most DVD-Video titles.” (
The MPEG file consists of compressed video data, called the video stream. Video Stream can be broken down into Group of Pictures (GOPs). GOPs are sometimes referred to as frames. Frames can be even further broken down into slices, slices const of macroblocks, and macroblocks are broken down to 8X8 pixels blocks. Below is the picture representation of the Video Stream data hierarchy. Every macro block contains 4 luminance blocks and 2 chrominance blocks. Every block has a dimension of 8x8 values. The luminance blocks contain information of the brightness of every pixel in macro block. The chrominance blocks contain color information. Because of some properties of the human eye it isn't necessary to give color information for every pixel. Instead 4 pixels are related to one color value (figure 2).


The basic unit of the video stream is a "Group of Pictures" (GOP), made up of three picture types, and called frames: I, P, and B. The ‘I’-frames can be restructured without any references to other frames. On average, the ‘I’-frames can occur one in every ten-fifteen frames of motion picture. This type of frames contains information only about itself. ‘P’-frames can only be recreated by references from previous I-frame or P-frame; it is impossible to construct them without any data of another frame. The ‘B’-frames are referred to as bi-directional frames, because they can be recreated based on forward and backward predictions from the information presented in the nearest preceding and following ‘I’ or ‘P’ frame (figure 3).

Figure 3 – Three types of picture frames

One of the assumptions of MPEG-2 compression is that motion pictures contain lots of redundancies. MPEG-2 removes these redundancies, thus, making a file smaller without making its quality worse. There are two major redundancies that are common in motion pictures: temporal, and spatial. Temporal redundancy arises when consecutive frames of video display images of the same scene. It is common for the content of the scene to remain fixed or to change only slightly between successive frames. Spatial redundancy occurs because parts of the picture (called pixels) are often replicated (with minor changes) within a single frame of video. (

Figure 4 - Redundancies

In general, a human eye has a limited response to fine spatial detail, and is less sensitive to detail near object edges or around shot-changes. Therefore, the changes to the video made in the process of bit rate reduction, should not be visible to a human observer.

MPEG compression is accomplished by four basic techniques: pre-processing, temporal prediction, motion compensation, and quantization coding. Pre-processing filters out non-essential visual information from the video signal, information that is difficult to encode but not an important component of human visual perception. A mathematical algorithm called the Discrete Cosine Transform (DCT) reorganizes the residual difference between frames. The algorithms typically divide a frame in blocks of 8 8 pixels, and encode each block using discrete cosine transform (DCT) algorithm. Discrete Cosine Transform is used in temporal prediction. To take advantage of the temporal redundancy, the pixel values in a block may be predicted based on blocks in nearby frames. When such prediction is used, the block is represented not by the actual pixel values, but rather by the differences from the matching pixel values in the frame used for prediction. Figure 5 gives an example of such testing.

Figure 5 – Break down of picture frame

To make prediction better, motion compensation is used. A displacement vector is associated with a block, describing how the block has moved relatively to the frame used for prediction. The vector should point to the block giving optimal prediction. Figure 6 gives an example

Figure 6Motion Compensation Technique

Motion-compensated prediction assumes that the current picture can be locally modeled as a translation of the pictures of some previous time.

Figure 7 – Motion Compensation Technique
The quantization process refers to the DCT coefficients and is performed in order to both remove the subjective redundancy and control the compression factor. Quantization coding converts these sets of coefficient numbers into even more compact representative numbers. The encoder refers to an internal index or codebook of possible representative numbers from which it selects the code word that best matches each set of coefficients. Quantization coding also rounds off all coefficient values, within a certain range of limits, to the same value. Although this results in an approximation of the original signal, it is close enough to the original to be acceptable for most viewing applications. MPEG-2 Video Compression is used in the following areas:

  • Multimedia Communications

  • Webcasting

  • Broadcasting

  • Video on Demand

  • Interactive Digital Media

  • Telecommunications

  • Mobile communications

An Overview of MPEG-2 Transmission

Transmission simply means the process of how data is being transmitted from one computer to another. There are two concepts in the MPEG-2 transmission that we need to address: (1) building the MPEG bit stream, and (2) MPEG-2 multiplexing.
1. Building MPEG bit stream contains the most basic component is known as an Elementary Stream (ES). Elementary Stream consists of:

The more advanced condense version of Elementary Stream is the Packetized Elementary Stream (PES). In general the PES has the following characteristics:

  • Each ES is combined into stream of PES packets

  • A PES packet can be fixed or variable in sized block

  • Each block has up to 65,536 bytes per block, and a 6-byte protocol header

  • Each block has the header and payload

  • The header is the identifier of the payload

  • The payload contains data and program

2. MPEG-2 multiplexing consists of Program and Transport Stream. Program stream has the following unique characteristics:

  • MPEG program stream is a group of closely condensed of PES packets

  • Program stream is concentrated on error free and quality of the data

  • It is widely used in video playback and network application

This is how program stream works. Video and audio data sent to the MPEG-2 compressor to encode the data, when the encoding is finished it sends the data to the MPEG-2 systems processor, then the MPEG-2 systems processor transfers the data to the program stream decoder.

Conversely, Transport Stream is focused on the size of the data. Each PES packet is broken into fixed-sized transport packet of 188B, and transport stream can contain video and audio data. In the transport stream, the video and audio data are being sent to the MPEG-2 systems processor to encode the data, after that it transfers the data to the transport multiplexer, and the transport multiplexer combines the data into one transport stream and send to the decoder. Please see figure 8 below for more detailed.

Figure 8 - Combining ES from Encoders into a Transport Stream.

Note: From

Types of MPEG-2 decoder
There are five types of MPEG-2 decoder are being discussed below:
1. Software & PC-based MPEG-2 Decoders
The Software and PC-based MPEG-2 are very similar to each other. The decoding process decoding is done by application software.

Figure 9 - Software & PC-based MPEG-2 Decoders

Note: From

2. MPEG-2 Computer Decoder

The MPEG-2 computer decoder is relied on the computer CPU to do all the decoding processing.

Figure 10 - MPEG-2 Computer Decoder

Note: From

3. MPEG-2 Network Computers/Thin Clients

The thin clients network does the decoding process. When the decoding process is completed, it sends data back to the computer for display.

Figure 11 - MPEG-2 Network Computers/Thin Clients

Note: From

4. MPEG-2 Set-Top Box

The external box does the decoding process. After the decoding process is finished, the external box sends data to the computer.

Figure 12 - MPEG-2 Set-Top Box

Note: From

5. MPEG-2 Consumer Equipment

The extra accessories on the side of the computer support the computer to do the decoding process faster.

Figure 13 - MPEG-2 Consumer Equipment

Note: From

MPEG-4 Background

In order to handle specific requirements from multimedia applications that are rapidly developing, the new standard of MPEG-4 submerged. MPEG-4 is a graphics and video compression algorithm standard that is based on MPEG-1 and MPEG-2 and Apple QuickTime technology. It is an international standard for coding of audiovisual objects that provides technologies for the manipulation, storage and communication of multimedia objects. MPEG-4 defines a various set of compression technologies and formats that addresses a wide range of applications and products. MPEG-4 files can be designed to transmit video and images over a narrower bandwidth and can mix video with text, graphics and 2-D and 3-D animation layers.

MPEG-4 Standards Overview
MPEG-4 Standards consist of 6 parts. They are Systems, Visual, Audio, Conformance Testing, Reference Software, and Delivery Multimedia Integration Framework (DMIF). Part 1 Systems identifies scene description, multiplexing, synchronization, buffer management, and management and protection of intellectual property. Part 2 Visual specifies the coded representation of natural and synthetic visual objects. Part 3 Audio specifies the coded representation of natural and synthetic audio objects. Part 4 Conformance Testing defines conformance conditions for bit streams and devices; this part is used to test MPEG-4 implementations. Part 5 Reference Software includes software corresponding to most parts of MPEG-4 and it implements compliant products as ISO waives the copyright of the code. Lastly, Part 6 Delivery Multimedia Integration Framework (DMIF) defines a session protocol for the management of multimedia streaming.

Features of MPEG-4

MPEG-4 consists of media types including video, speech, audio, texture, graphics, text and animation. It standardizes storage file formats and carriage of media over a broad range of narrowband and broadband transport networks. It also supports playback on a variety of target devices. The multimedia information of MPEG-4 files can be scaled for quality and contents metrics, within the available network bandwidth and within processing power of the host terminal. MPEG-4 is best fitting with such application requirements as complexity, image quality, bandwidth, and scalability. It is also best for video contents such as natural images, video conferencing, medical images, and synthetic images.

Functionalities of MPEG-4

MPEG-4 provides standardization of the production, distribution, and content access relating to Digital television, Interactive graphics application, and Interactive multimedia. The MPEG-4 standard provides a set of technologies of those three fields to satisfy the needs of authors, service providers and end users. For authors, MPEG-4 enables the production of content with reusability. In addition, it helps better manage and protect content owner rights. Moreover, it offers greater flexibility with individual technologies such as digital television, animated graphics, World Wide Web pages and their extensions. For network providers, MPEG-4 offers transparent information. The transparent information allows network providers to set exact translations for each media type that enables the best mean of transport in various networks. For end users, MPEG-4 allows higher levels of interaction pertaining to content. In addition to interaction, it brings multimedia to new networks and or mobile that employ relatively low bit rate. The coded representation of media objects according to these functionalities is as sufficient as possible in ways of error robustness, easy extraction and editing of an object, and having an object available in a scalable form.

Difference of MPEG-4 from Previous MPEG Groups

The main difference of the new standard MPEG-4 with respect to MPEG-1 and MPEG-2 in terms of requirements and functionalities are that it goes beyond the goals of making the storage and transmission of digital audiovisual material more efficient, by compressing data. MPEG-4 specifies a description of digital audiovisual scenes in the form of objects that have certain relations in space and time (see below figure). MPEG-4 offers a new kind of interactivity with each audiovisual object and at the levels of coding, decoding, or objects compression. It also integrates objects of different natures such as natural video graphics and text. Moreover, MPEG-4 allows universal access to multimedia information by taking into account specifications of wide variety of networks that neither MPEG-1 nor MPEG-2 offers.

Figure 14 – MPEG-4 Scene

Targeted Applications

As mentioned, MPEG-4 submerged due to the need for a new standard that could address the new demands that arise in a world in which more and more audiovisual material is exchanged in digital form. The targeted applications that required specific multimedia requirements are: digital TV, mobile multimedia, TV production, games, and streaming video.

In regards to digital TV, MPEG-4 allows increased text, picture, audio, or graphics to be controlled by the user so that entertainment value can be added to certain programs or provide valuable information unrelated to the current program to the interest of the viewers: Examples of increased functionalities would be TV station logos, customized advertising, multi-window screen formats allowing display of sports stats or stock quotes using data-casting.

Mobile multimedia is targeted due to the enormous popularity of cell phones and palm computers. MPEG-4 deals with mobile devices in the manner that it can deal with the narrow bandwidth and limited computational capacity (By improving error resilience, code efficiency, and flexibility of resources).

Pertaining to TV production, since MPEG-4 focuses on coding of audiovisual objects instead of rectangular linear video frames, it allows high quality and more flexible scenes. An example is how local TV stations could inject regional advertisement video objects that would be better suited when international programs are broadcasted depending on the targeted viewers.

In dealing with games the main focus is user interaction. MPEG-4’s role is it allows video objects in games to be even more realistic. An example is how the creator can personalize games by using personal video data bases linked in real-time into the games.

Streaming video over the Internet is becoming very popular, therefore, its one of the targets. Examples of MPEG-4 in regards to streaming video are news updates and live music shows. In this case bandwidth is limited due to the use of modems and transmission reliability so there is an issue when packet loss occurs. MPEG-4 has improved on scalability of bit stream in term of the temporal and spatial resolution.


MPEG-7 is another ISO/IEC standard being developed by MPEG (Moving Picture Experts Group). It’s a content representation standard for information search. MPEG-7 offers metadata information for audio and video files, allowing searching and indexing of a/v data based on the information about the content instead of searching the actual content bit stream.


MPEG-7 makes searching the Web for multimedia content as easy as searching for text-only files. MPEG-7 addresses both: retrieval from digital archives (pull applications), as well as filtering of streamed audiovisual broadcasts on the Internet (push applications). It operates in both real-time and non real-time environments. A "real-time environment" in this context means that the description is generated at the same time as the content is being captured (e.g., smart cameras and scanners). Currently MPEG Committee is working on another standard: MPEG-21. The goal of MPEG-21 is defining the technology needed to support Users to exchange, access, consume, trade and otherwise manipulate Digital Items in an efficient, transparent and interoperable way. So, basically, it is a pier-to-pier connection. At its most basic level, MPEG-21 provides a framework in which one User interacts with another User and the object of that interaction is a Digital Item commonly called content. In a nutshell, MPEG-21 will provide several tools to manage how digital objects — such as audio, video or multimedia files — are encoded, secured, transmitted and viewed. The tools offer means to identify content, to manage how it is searched, cached, archived and retrieved and to manage how to adjust the display to fit numerous end-user devices. MPEG-21 is designed to work with its predecessors and it provides a more universal framework for digital content protection. MPEG-21 has been tabbed as the standard for the 21st Century. It’s still a work in progress — most of MPEG-21’s elements are set for completion in 2003 and 2004.

Tips to reduce, reuse, and recycle. (1995). Environmental Recycling Hotline.

Retrieved October 23, 2000, from


  1. Leonardo Chiariglione. (June 1996) Retrieved October 20, 2002.

  2. Retrieved October 20, 2002.

  3. Retrieved October 28, 2002.

  4. Retrieved November 9, 2002.

  5. Retrieved November 9, 2002.

  6. Retrieved November 9, 2002.





Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2016
rəhbərliyinə müraciət