MPEG File Format Summary

Also Known As: MPG, MPEG-1, MPEG-2

Type	Audio/video data storage
Colors	Up to 24-bits (4:2:0 YCbCr color space)
Compression	DCT and block-based scheme with motion compensation
Maximum Image Size	4095x4095x30 frames/second
Multiple Images Per File	Yes (multiple program multiplexing)
Numerical Format	NA
Originator	Motion Picture Experts Group (MPEG) of the International Standards Organization (ISO)
Platform	All
Supporting Applications	Xing Technologies MPEG player, others
See Also	JPEG File Interchange Format, Intel DVI

Usage
Stores an MPEG-encoded data stream on a digital storage medium. MPEG is used to encode audio, video, text, and graphical data within a single, synchronized data stream.

Comments
MPEG-1 is a finalized standard in wide use. MPEG-2 is still in the development phase and continues to be revised for a wider base of applications. Currently, there are few stable products available for making practical use of the MPEG standard, but this is changing.

Vendor specifications are available for this format.

Code fragments are available for this format.

Sample images are available for this format.

MPEG (pronounced "em-peg") is an acronym for the Motion Picture Experts Group, a working group of the International Standards Organization (ISO) that is responsible for creating standards for digital video and audio compression.

Contents:
File Organization
File Details
For Further Information

The MPEG specification is a specification for an encoded data stream which contains compressed audio and video information. MPEG was designed specifically to store sound and motion-video data on standard audio Compact Discs (CD) and Digital Audio Tapes (DAT).

The main application for MPEG is the storage of audio and video data on CD-ROMs for use in multimedia systems, such as those found on the Apple Macintosh platform and in the Microsoft Windows environment. Such systems require the ability to store and play back high-quality audio and video material for commercial, educational, and recreational applications. The new MPEG-2 standard allows the transmission of MPEG data across television and cable network systems.

On most systems, you use special hardware to capture MPEG data from a live video source at a real-time sampling rate of 30 frames per second. Each frame of captured video data is then compressed and stored as an MPEG data stream. If an audio source is also being sampled, it too is encoded and multiplexed in with the video stream, with some extra information to synchronize the two streams together for playback.

To play back MPEG data, you use either a hardware/software or software-only player. The player reads in the MPEG data stream, decompresses the information, and sends it to the display and audio systems of the computer. Speed of the playback depends upon how quickly the resources of the computer allow the MPEG data to be read, decompressed, and played. Available memory, CPU speed, and disk I/O throughput are all contributing factors. The quality of the MPEG stream is determined during encoding, and there are typically no adjustments available to allow an application to "tweak" the apparent quality of the MPEG output produced during playback.

MPEG is based on digital television standards (specified in CCIR-601) used in the United States. In its initial form, MPEG is not actually capable of storing CCIR-601 images. The typical resolution of 720x576 requires more bandwidth than the maximum MPEG data rate of 1.86Mbits/second allows. Standard television images must therefore be decimated by 2:1 into lower resolution SIF format data (352x240) to be stored.

European (PAL and SECAM) and Japanese standards are different in many respects, including the display rate (30 frames/second U.S., 25 frames/second European) and the number of lines per field (240 U.S., 288 European). Therefore, an MPEG player must be able to recognize a wide variety of variations possible in the encoded video signal itself.

Constrained Parameters Bitstreams (CPB) are a complex aspect of MPEG. CPBs are those bitstreams that are limited in terms of picture size, frame rate, and coded bit-rate parameters. These limitations normalize the computation complexity required of both hardware and software, thus guaranteeing a reasonable, nominal subset of MPEG that can be decoded by the widest possible range of applications while still remaining cost-effective. MPEG bitstreams for video are limited to 1.86 Mbits/second if they meet constrained parameters. If it were not for the constrained parameters, the MPEG syntax could specify a data rate of more than 100 Mbits/second.

File Organization

No actual structured MPEG file format has been defined. Everything required to play back MPEG data is encoded directly in the data stream. Therefore, no header or other type of wrapper is necessary. It is likely that when needed, a multimedia standards committee--perhaps MHEG or the DSM (Digital Storage Medium) MPEG subgroup--will one day define an MPEG file format.

File Details

This section describes the relationship between MPEG, JPEG, and MJPEG, the type of compression used for MPEG files, and the MPEG-2 standard.

Relationship Between MPEG, JPEG, and MJPEG

Some people are confused about the relationship between MPEG and JPEG. The MPEG and JPEG (Joint Photographic Experts Group) committees of the ISO originally started as the same group, but with two different purposes. JPEG focused exclusively on still-image compression, while MPEG focused on the encoding/synchronization of audio and video signals within a single data stream. Although MPEG employs a method of spatial data compression similar to that used for JPEG, they are not the same standard nor were they designed for the same purpose.

Another acronym you may hear is MJPEG (Motion JPEG). Several companies have come out with an alternative to MPEG--a simpler solution (but not yet a standard) for how to store motion video. This solution, called Motion JPEG, simply uses a digital video capture device to sample a video signal, to capture frames, and to compress each frame in its entirety using the JPEG compression method. A Motion JPEG data stream is then played back by decompressing and displaying each individual frame. A standard audio compression method is usually included in the Motion JPEG data stream.

There are several advantages to using Motion JPEG:

Fast, real-time compression rate
No frame-to-frame interpolation (motion compensation) of data is required

But there are also disadvantages:

Motion JPEG files are considerably larger than MPEG files
They are somewhat slower to play back (more information per frame than MPEG)
They exhibit poor video quality if a higher JPEG compression ratio (quality factor) is used

On average, the temporal compression method used by MPEG provides a compression ratio three times that of JPEG for the same perceived picture quality.

MPEG Compression

MPEG uses an asymmetric compression method. Compression under MPEG is far more complicated than decompression, making MPEG a good choice for applications that need to write data only once, but need to read it many times. An example of such an application is an archiving system. Systems that require audio and video data to be written many times, such as an editing system, are not good choices for MPEG; they will run more slowly when using the MPEG compression scheme.

MPEG uses two types of compression methods to encode video data: interframe and intraframe encoding. Interframe encoding is based upon both predictive coding and interpolative coding techniques, as described below.

When capturing frames at a rapid rate (typically 30 frames/second for real time video) there will be a lot of identical data contained in any two or more adjacent frames. If a motion compression method is aware of this "temporal redundancy," as many audio and video compression methods are, then it need not encode the entire frame of data, as is done via intraframe encoding. Instead, only the differences (deltas) in information between the frames is encoded. This results in greater compression ratios, with far less data needing to be encoded. This type of interframe encoding is called predictive encoding.

A further reduction in data size may be achieved by the use of bi-directional prediction. Differential predictive encoding encodes only the differences between the current frame and the previous frame. Bi-directional prediction encodes the current frame based on the differences between the current, previous, and next frame of the video data. This type of interframe encoding is called motion-compensated interpolative encoding.

To support both interframe and intraframe encoding, an MPEG data stream contains three types of coded frames:

I-frames (intraframe encoded)
P-frames (predictive encoded)
B-frames (bi-directional encoded)

An I-frame contains a single frame of video data that does not rely on the information in any other frame to be encoded or decoded. Each MPEG data stream starts with an I-frame.

A P-frame is constructed by predicting the difference between the current frame and closest preceding I- or P-frame. A B-frame is constructed from the two closest I- or P-frames. The B-frame must be positioned between these I- or P-frames.

A typical sequence of frames in an MPEG stream might look like this:

IBBPBBPBBPBBIBBPBBPBBPBBI

In theory, the number of B-frames that may occur between any two I- and P-frames is unlimited. In practice, however, there are typically twelve P- and B-frames occurring between each I-frame. One I-frame will occur approximately every 0.4 seconds of video runtime.

Remember that the MPEG data is not decoded and displayed in the order that the frames appear within the stream. Because B-frames rely on two reference frames for prediction, both reference frames need to be decoded first from the bitstream, even though the display order may have a B-frame in between the two reference frames.

In the previous example, the I-frame is decoded first. But, before the two B-frames can be decoded, the P-frame must be decoded, and stored in memory with the I-frame. Only then may the two B-frames be decoded from the information found in the decoded I- and P-frames. Assume, in this example, that you are at the start of the MPEG data stream. The first ten frames are stored in the sequence IBBPBBPBBP (0123456789), but are decoded in the sequence:

IPBBPBBPBB (0312645978)

and finally are displayed in the sequence:

IBBPBBPBBP (0123456789)

Once an I-, P-, or B-frame is constructed, it is compressed using a DCT compression method similar to JPEG. Where interframe encoding reduces temporal redundancy (data identical over time), the DCT-encoding reduces spatial redundancy (data correlated within a given space). Both the temporal and the spatial encoding information are stored within the MPEG data stream.

By combining spatial and temporal subsampling, the overall bandwidth reduction achieved by MPEG can be considered to be upwards of 200:1. However, with respect to the final input source format, the useful compression ratio tends to be between 16:1 and 40:1. The ratio depends upon what the encoding application deems as "acceptable" image quality (higher quality video results in poorer compression ratios). Beyond these figures, the MPEG method becomes inappropriate for an application.

In practice, the sizes of the frames tend to be 150 Kbits for I-frames, around 50 Kbits for P-frames, and 20 Kbits for B-frames. The video data rate is typically constrained to 1.15 Mbits/second, the standard for DATs and CD-ROMs.

The MPEG standard does not mandate the use of P- and B-frames. Many MPEG encoders avoid the extra overhead of B- and P-frames by encoding I-frames. Each video frame is captured, compressed, and stored in its entirety, in a similar way to Motion JPEG. I-frames are very similar to JPEG-encoded frames. In fact, the JPEG Committee has plans to add MPEG I-frame methods to an enhanced version of JPEG, possibly to be known as JPEG-II.

With no delta comparisons to be made, encoding may be performed quickly; with a little hardware assistance, encoding can occur in real time (30 frames/second). Also, random access of the encoded data stream is very fast because I-frames are not as complex and time-consuming to decode as P- and B-frames. Any reference frame needs to be decoded before it can be used as a reference by another frame.

There are also some disadvantages to this scheme. The compression ratio of an I-frame-only MPEG file will be lower than the same MPEG file using motion compensation. A one-minute file consisting of 1800 frames would be approximately 2.5Mb in size. The same file encoded using B- and P-frames would be considerably smaller, depending upon the content of the video data. Also, this scheme of MPEG encoding might decompress more slowly on applications that allocate an insufficient amount of buffer space to handle a constant stream of I-frame data.

MPEG-2

The original MPEG standard is now referred to as MPEG-1. The MPEG-1 Video Standard is aimed at small-scale systems using CD-ROM storage and small, lower resolution displays. Its 1.5-Megabit/second data rate, however, limits MPEG-1 from many high-power applications. The next phase in MPEG technology development is MPEG-2.

The new MPEG-2 standard is a form of digital audio and video designed for the television industry. It will be used primarily as a way to consolidate and unify the needs of cable, satellite, and television broadcasts, as well as computing, optical storage, Ethernet, VCR, CD-I, HDTV, and blue-laser CD-ROM systems.

MPEG-2 is an extension of the MPEG-1 specification and therefore shares many of the same design features. The baseline part of MPEG-2 is called the Video Main Profile and provides a minimum definition of data quality. This definition fills the needs of high-quality television program distribution over a wide variety of data networks. Video Main Profile service over cable and satellite systems could possibly start in 1994. Consumers who need such features as interactive television and vision phones will benefit greatly from this service.

Features added by MPEG-2 include:

Interlaced video formats
Multiple picture aspect ratios (such as 4:3 and 16:9, as required by HDTV)
Conservation of memory usage (by lowering the picture quality below the Video Main Profile definition)
Increased video quality over MPEG-1 (when coding for the same target arbitrates)
Ability to decode MPEG-1 data streams.

MPEG-2 can also multiplex audio, video, and other information into a single data stream and provides 2- to 15-Mbits/second data rates while maintaining full CCIR-601 image quality. MPEG-2 achieves this by the use of two types of data streams: the Program stream and the Transport stream.

The Program stream is similar to the MPEG-1 System stream, with extensions for encoding program-specific information, such as multiple language audio channels. The Transport stream was newly added to MPEG-2 and is used in broadcasting by multiplexing multiple programs comprised of audio, video, and private data, such as combining standard-definition TV and HDTV signals on the same channel. MPEG-2 supports multi-program broadcasts, storage of programs on VCRs, error detection and correction, and synchronization of data streams over complex networks.

Just as MPEG-1 encoding and decoding hardware has appeared, so will the same hardware for MPEG-2. With its broad range of applications and its toolkit approach, MPEG-2 encoding and decoding is very difficult to implement fully in a single chip. A "do everything" MPEG-2 chipset is not only difficult to design, but also expensive to sell. It is more likely that MPEG-2 hardware designed for specific applications will appear in the near future, with much more extensible chipsets to come in the more distant future.

The compression used on the MPEG audio stream data is based on the European MUSICAM standard, with additional pieces taken from other algorithms. It is similar in conception to the method used to compress MPEG video data. It is a lossy compression scheme, which throws away (or at least assigns fewer bits of resolution to) audio data that humans cannot hear. It is also a temporal-based compression method, compressing the differences between audio samples rather than the samples themselves. At this writing, a publicly available version of the audio code was due to be released by the MPEG audio group.

The typical bandwidth of a CD audio stream is 1.5 Mbits/second. MPEG audio compression can reduce this data down to approximately 256 Kbits/second for a 6:1 compression ratio with no discernible loss in quality (lower reductions are also possible). The remaining 1.25 Mbits/second of the bandwidth contain the MPEG-1 video and system streams. And using basically the same MPEG-1 audio algorithm, MPEG-2 audio will add discrete surround sound channels.

For Further Information

For further information about MPEG, see the MPEG Frequently Asked Questions (FAQ) document included on the CD-ROM. Note, however, that this FAQ is included for background only; because it is constantly updated, you should obtain a more recent version. The MPEG FAQ on USENET is posted monthly to the newsgroups comp.graphics, comp.compression, and comp.multimedia. The FAQ is available by using FTP from rtfm.mit.edu and is located in the directories that are called /pub/usenet/comp.graphics and /pub/usenet/comp.compression.

To obtain the full MPEG draft standard, you will have to purchase it from ANSI. The MPEG draft ISO standard is ISO CD 11172. This draft contains four parts:

11172.1	Synchronization and multiplexing of audio-visual information
11172.2	Video compression
11172.3	Audio compression
11172.4	Conformance testing

Contact ANSI at:

American National Standards Institute
Sales Department
1430 Broadway
New York, NY, 10018
Voice: 212-642-4900

Drafts of the MPEG-2 standard are expected to be available soon. For more information about MPEG, see the following article:

Le Gall, Didier, "MPEG: A Video Compression Standard for Multimedia Applications," Communications of the ACM, vol. 3, no. 4, April 1991, pp. 46-58.

On the CD-ROM you will find several pieces of MPEG software. The ISO MPEG-2 Codec software, which converts uncompressed video frames into MPEG-1 and MPEG-2 video-coded bitstream sequences, and vice versa, is included in source code form and as a precompiled MS-DOS binary. The Sparkle MPEG player is also included for Macintosh platforms.

This page is taken from the Encyclopedia of Graphics File Formats and is licensed by O'Reilly under the Creative Common/Attribution license.

More Resources