Contents Page

səhifə	3/35
tarix	25.06.2016
ölçüsü	2.59 Mb.

1 2 3 4 5 6 7 8 9 ... 35

1 Scope

This Recommendation | International Standard specifies the coded representation of picture information for digital storage media and digital video communication and specifies the decoding process. The representation supports constant bitrate transmission, variable bitrate transmission, random access, channel hopping, scalable decoding, bitstream editing, as well as special functions such as fast forward playback, fast reverse playback, slow motion, pause and still pictures. This Recommendation | International Standard is forward compatible with ISO/IEC 11172-2 and upward or downward compatible with EDTV, HDTV, SDTV formats.

This Recommendation | International Standard is primarily applicable to digital storage media, video broadcast and communication. The storage media may be directly connected to the decoder, or via communications means such as busses, LANs, or telecommunications links.

2 Normative references

The following ITU-T Recommendations and International Standards contain provisions which through reference in this text, constitute provisions of this Recommendation | International Standard. At the time of publication, the editions indicated were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent editions of the standards indicated below. Members of IEC and ISO maintain registers of currently valid International Standards. The Telecommunication Standardisation Bureau maintains a list of currently valid ITU-T Recommendations.

• Recommendations and reports of the CCIR, 1990 XVIIth Plenary Assembly, Dusseldorf, 1990 Volume XI - Part 1 Broadcasting Service (Television) Recommendation ITU R BT.601 3 “Encoding parameters of digital television for studios”.

• CCIR Volume X and XI Part 3 Recommendation ITU R BR.648 “Recording of audio signals”.

• CCIR Volume X and XI Part 3 Report ITU R 955 2 “Satellite sound broadcasting to vehicular, portable and fixed receivers in the range 500 - 3000Mhz”.

• ISO/IEC 11172-1 1993, Information technology — Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s — Part 1: Systems.

• ISO/IEC 11172-2 1993, Information technology — Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s — Part 2: Video.

• ISO/IEC 11172-3 1993, Information technology — Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s — Part 3: Audio.

• IEEE Standard Specifications for the Implementations of 8 by 8 Inverse Discrete Cosine Transform, IEEE Std 1180-1990, December 6, 1990.

_•IEC Publication 908:1987, CD Digital Audio System.

• IEC Publication 461:1986, Time and control code for video tape recorder.

• ITU-T Recommendation H.261 (Formerly CCITT Recommendation H.261) Codes for audiovisual services at px64 kbit/s Geneva, 1990.

• ISO/IEC 10918-1:1994 | Recommendation ITU T T.81 (JPEG) Information Technology —Digital compression and coding of continuous-tone still images: Requirements and guidelines.

3 Definitions

For the purposes of this Recommendation | International Standard, the following definitions apply.

3.1 AC coefficient: Any DCT coefficient for which the frequency in one or both dimensions is non-zero.

3.2 big picture: A coded picture that would cause VBV buffer underflow as defined in C.7 Annex C. Big pictures can only occur in sequences where low_delay is equal to 1. “Skipped picture” is a term that is sometimes used to describe the same concept.

3.3 B-field picture: A field structure B-Picture.

3.4 B-frame picture: A frame structure B-Picture.

3.5 B-picture; bidirectionally predictive-coded picture: A picture that is coded using motion compensated prediction from past and/or future reference fields or frames.

3.6 backward compatibility: A newer coding standard is backward compatible with an older coding standard if decoders designed to operate with the older coding standard are able to continue to operate by decoding all or part of a bitstream produced according to the newer coding standard.

3.7 backward motion vector: A motion vector that is used for motion compensation from a reference frame or reference field at a later time in display order.

3.8 backward prediction: Prediction from the future reference frame (field).

3.9 base layer: First, independently decodable layer of a scalable hierarchy

3.10 bitstream; stream: A ordered series of bits that forms the coded representation of the data.

3.11 bitrate: The rate at which the coded bitstream is delivered from the storage medium to the input of a decoder.

3.12 block: An 8-row by 8-column matrix of samples, or 64 DCT coefficients (source, quantised or dequantised).

3.13 bottom field: One of two fields that comprise a frame. Each line of a bottom field is spatially located immediately below the corresponding line of the top field.

3.14 byte aligned: A bit in a coded bitstream is byte-aligned if its position is a multiple of 8-bits from the first bit in the stream.

3.15 byte: Sequence of 8-bits.

3.16 channel: A digital medium that stores or transports a bitstream constructed according to this specification.

3.17 chrominance format: Defines the number of chrominance blocks in a macroblock.

3.18 chroma simulcast: A type of scalability (which is a subset of SNR scalability) where the enhancement layer (s) contain only coded refinement data for the DC coefficients, and all the data for the AC coefficients, of the chrominance components.

3.19 chrominance component: A matrix, block or single sample representing one of the two colour difference signals related to the primary colours in the manner defined in the bitstream. The symbols used for the chrominance signals are Cr and Cb.

3.20 coded B-frame: A B-frame picture or a pair of B-field pictures.

3.21 coded frame: A coded frame is a coded I-frame, a coded P-frame or a coded B-frame.

3.22 coded I-frame: An I-frame picture or a pair of field pictures, where the first field picture is an I-picture and the second field picture is an I-picture or a P-picture.

3.23 coded P-frame: A P-frame picture or a pair of P-field pictures.

3.24 coded picture: A coded picture is made of a picture header, the optional extensions immediately following it, and the following picture data. A coded picture may be a coded frame or a coded field.

3.25 coded video bitstream: A coded representation of a series of one or more pictures as defined in this specification.

3.26 coded order: The order in which the pictures are transmitted and decoded. This order is not necessarily the same as the display order.

3.27 coded representation: A data element as represented in its encoded form.

3.28 coding parameters: The set of user-definable parameters that characterise a coded video bitstream. Bitstreams are characterised by coding parameters. Decoders are characterised by the bitstreams that they are capable of decoding.

3.29 component: A matrix, block or single sample from one of the three matrices (luminance and two chrominance) that make up a picture.

3.30 compression: Reduction in the number of bits used to represent an item of data.

3.31 constant bitrate coded video: A coded video bitstream with a constant bitrate.

3.32 constant bitrate: Operation where the bitrate is constant from start to finish of the coded bitstream.

3.33 data element: An item of data as represented before encoding and after decoding.

3.34 data partitioning: A method for dividing a bitstream into two separate bitstreams for error resilience purposes. the two bitstreams have to be recombined before decoding.

3.35 D-Picture: A type of picture that shall not be used except in ISO/IEC 11172-2.

3.36 DC coefficient: The DCT coefficient for which the frequency is zero in both dimensions.

3.37 DCT coefficient: The amplitude of a specific cosine basis function.

3.38 decoder input buffer: The first-in first-out (FIFO) buffer specified in the video buffering verifier.

3.39 decoder: An embodiment of a decoding process.

3.40 decoding (process): The process defined in this specification that reads an input coded bitstream and produces decoded pictures or audio samples.

3.41 dequantisation: The process of rescaling the quantised DCT coefficients after their representation in the bitstream has been decoded and before they are presented to the inverse DCT.

3.42 digital storage media; DSM: A digital storage or transmission device or system.

3.43 discrete cosine transform; DCT: Either the forward discrete cosine transform or the inverse discrete cosine transform. The DCT is an invertible, discrete orthogonal transformation. The inverse DCT is defined in Annex A of this specification.

3.44 display aspect ratio: The ratio height/width (in SI units) of the intended display.

3.45 display order: The order in which the decoded pictures are displayed. Normally this is the same order in which they were presented at the input of the encoder.

3.46 display process: The (non-normative) process by which reconstructed frames are displayed.

3.47 dual-prime prediction: A prediction mode in which two forward field-based predictions are averaged. The predicted block size is 16x16 luminance samples. Dual-prime prediction is only used in interlaced P-pictures.

3.48 editing: The process by which one or more coded bitstreams are manipulated to produce a new coded bitstream. Conforming edited bitstreams must meet the requirements defined in this specification.

3.49 encoder: An embodiment of an encoding process.

3.50 encoding (process): A process, not specified in this specification, that reads a stream of input pictures or audio samples and produces a valid coded bitstream as defined in this specification.

3.51 enhancement layer: A relative reference to a layer (above the base layer) in a scalable hierarchy. For all forms of scalability, its decoding process can be described by reference to the lower layer decoding process and the appropriate additional decoding process for the enhancement layer itself.

3.52 fast forward playback: The process of displaying a sequence, or parts of a sequence, of pictures in display-order faster than real-time.

3.53 fast reverse playback: The process of displaying the picture sequence in the reverse of display order faster than real-time.

3.54 field: For an interlaced video signal, a “field” is the assembly of alternate lines of a frame. Therefore an interlaced frame is composed of two fields, a top field and a bottom field.

3.55 field-based prediction: A prediction mode using only one field of the reference frame. The predicted block size is 16x16 luminance samples. Field-based prediction is not used in progressive frames.

3.56 field period: The reciprocal of twice the frame rate.

3.57 field picture; field structure picture: A field structure picture is a coded picture with picture_structure is equal to “Top field” or “Bottom field”.

3.58 flag: A one bit integer variable which may take one of only two values (zero and one).

3.59 forbidden: The term “forbidden” when used in the clauses defining the coded bitstream indicates that the value shall never be used. This is usually to avoid emulation of start codes.

3.60 forced updating: The process by which macroblocks are intra-coded from time-to-time to ensure that mismatch errors between the inverse DCT processes in encoders and decoders cannot build up excessively.

3.61 forward compatibility: A newer coding standard is forward compatible with an older coding standard if decoders designed to operate with the newer coding standard are able to decode bitstreams of the older coding standard.

3.62 forward motion vector: A motion vector that is used for motion compensation from a reference frame or reference field at an earlier time in display order.

3.63 forward prediction: Prediction from the past reference frame (field).

3.64 frame: A frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. For interlaced video a frame consists of two fields, a top field and a bottom field. One of these fields will commence one field period later than the other.

3.65 frame-based prediction: A prediction mode using both fields of the reference frame.

3.66 frame period: The reciprocal of the frame rate.

3.67 frame picture; frame structure picture: A frame structure picture is a coded picture with picture_structure is equal to “Frame”.

3.68 frame rate: The rate at which frames are be output from the decoding process.

3.69 future reference frame (field): A future reference frame(field) is a reference frame(field) that occurs at a later time than the current picture in display order.

3.70 frame reordering: The process of reordering the reconstructed frames when the coded order is different from the display order. Frame reordering occurs when B-frames are present in a bitstream. There is no frame reordering when decoding low delay bitstreams.

3.71 group of pictures: A notion defined only in ISO/IEC 11172-2 (MPEG-1 Video). In this specification, a similar functionality can be achieved by the mean of inserting group of pictures headers.

3.72 header: A block of data in the coded bitstream containing the coded representation of a number of data elements pertaining to the coded data that follow the header in the bitstream.

3.73 hybrid scalability: Hybrid scalability is the combination of two (or more) types of scalability.

3.74 interlace: The property of conventional television frames where alternating lines of the frame represent different instances in time. In an interlaced frame, one of the field is meant to be displayed first. This field is called the first field. The first field can be the top field or the bottom field of the frame.

3.75 I-field picture: A field structure I-Picture.

3.76 I-frame picture: A frame structure I-Picture.

3.77 I-picture; intra-coded picture: A picture coded using information only from itself.

3.78 intra coding: Coding of a macroblock or picture that uses information only from that macroblock or picture.

3.79 level: A defined set of constraints on the values which may be taken by the parameters of this specification within a particular profile. A profile may contain one or more levels. In a different context, level is the absolute value of a non-zero coefficient (see “run”).

3.80 layer: In a scalable hierarchy denotes one out of the ordered set of bitstreams and (the result of) its associated decoding process (implicitly including decoding of all layers below this layer).

3.81 layer bitstream: A single bitstream associated to a specific layer (always used in conjunction with layer qualifiers, e.†g. “enhancement layer bitstream”)

3.82 lower layer: A relative reference to the layer immediately below a given enhancement layer (implicitly including decoding of all layers below this enhancement layer)

3.83 luminance component: A matrix, block or single sample representing a monochrome representation of the signal and related to the primary colours in the manner defined in the bitstream. The symbol used for luminance is Y.

3.84 Mbit: 1 000 000 bits

3.85 macroblock: The four 8 by 8 blocks of luminance data and the two (for 4:2:0 chrominance format), four (for 4:2:2 chrominance format) or eight (for 4:4:4 chrominance format) corresponding 8 by 8 blocks of chrominance data coming from a 16 by 16 section of the luminance component of the picture. Macroblock is sometimes used to refer to the sample data and sometimes to the coded representation of the sample values and other data elements defined in the macroblock header of the syntax defined in this part of this specification. The usage is clear from the context.

3.86 motion compensation: The use of motion vectors to improve the efficiency of the prediction of sample values. The prediction uses motion vectors to provide offsets into the past and/or future reference frames or reference fields containing previously decoded sample values that are used to form the prediction error.

3.87 motion estimation: The process of estimating motion vectors during the encoding process.

3.88 motion vector: A two-dimensional vector used for motion compensation that provides an offset from the coordinate position in the current picture or field to the coordinates in a reference frame or reference field.

3.89 non-intra coding: Coding of a macroblock or picture that uses information both from itself and from macroblocks and pictures occurring at other times.

3.90 opposite parity: The opposite parity of top is bottom, and vice versa.

3.91 P-field picture: A field structure P-Picture.

3.92 P-frame picture: A frame structure P-Picture.

3.93 P-picture; predictive-coded picture: A picture that is coded using motion compensated prediction from past reference fields or frame.

3.94 parameter: A variable within the syntax of this specification which may take one of a range of values. A variable which can take one of only two values is called a flag.

3.95 parity (of field): The parity of a field can be top or bottom.

3.96 past reference frame (field): A past reference frame(field) is a reference frame(field) that occurs at an earlier time than the current picture in display order.

3.97 picture: Source, coded or reconstructed image data. A source or reconstructed picture consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance signals. A “coded picture” is defined in 3.21. For progressive video, a picture is identical to a frame, while for interlaced video, a picture can refer to a frame, or the top field or the bottom field of the frame depending on the context.

3.98 picture data: In the VBV operations, picture data is defined as all the bits of the coded picture, all the header(s) and user data immediately preceding it if any (including any stuffing between them) and all the stuffing following it, up to (but not including) the next start code, except in the case where the next start code is an end of sequence code, in which case it is included in the picture data.

3.99 prediction: The use of a predictor to provide an estimate of the sample value or data element currently being decoded.

3.100 prediction error: The difference between the actual value of a sample or data element and its predictor.

3.101 predictor: A linear combination of previously decoded sample values or data elements.

3.102 profile: A defined subset of the syntax of this specification.

NOTE - In this specification the word “profile” is used as defined above. It should not be confused with other definitions of “profile” and in particular it does not have the meaning that is defined by JTC1/SGFS.

3.103 progressive: The property of film frames where all the samples of the frame represent the same instances in time.

3.104 quantisation matrix: A set of sixty-four 8-bit values used by the dequantiser.

3.105 quantised DCT coefficients: DCT coefficients before dequantisation. A variable length coded representation of quantised DCT coefficients is transmitted as part of the coded video bitstream.

3.106 quantiser scale: A scale factor coded in the bitstream and used by the decoding process to scale the dequantisation.

3.107 random access: The process of beginning to read and decode the coded bitstream at an arbitrary point.

3.108 reconstructed frame: A reconstructed frame consists of three rectangular matrices of 8-bit numbers representing the luminance and two chrominance signals. A reconstructed frame is obtained by decoding a coded frame.

3.109 reconstructed picture: A reconstructed picture is obtained by decoding a coded picture. A reconstructed picture is either a reconstructed frame (when decoding a frame picture), or one field of a reconstructed frame (when decoding a field picture). If the coded picture is a field picture, then the reconstructed picture is the top field or the bottom field of the reconstructed frame.

3.110 reference field: A reference field is one field of a reconstructed frame. Reference fields are used for forward and backward prediction when P-pictures and B-pictures are decoded. Note that when field P-pictures are decoded, prediction of the second field P-picture of a coded frame uses the first reconstructed field of the same coded frame as a reference field.

3.111 reference frame: A reference frame is a reconstructed frame that was coded in the form of a coded I-frame or a coded P-frame. Reference frames are used for forward and backward prediction when P-pictures and B-pictures are decoded.

3.112 reordering delay: A delay in the decoding process that is caused by frame reordering.

3.113 reserved: The term “reserved” when used in the clauses defining the coded bitstream indicates that the value may be used in the future for ISO/IEC defined extensions.

3.114 sample aspect ratio: (abbreviated to SAR). This specifies the relative distance between samples. It is defined (for the purposes of this specification) as the vertical displacement of the lines of luminance samples in a frame divided by the horizontal displacement of the luminance samples. Thus its units are (metres per line) ÷ (metres per sample)

3.115 scalable hierarchy: coded video data consisting of an ordered set of more than one video bitstream.

3.116 scalability: Scalability is the ability of a decoder to decode an ordered set of bitstreams to produce a reconstructed sequence. Moreover, useful video is output when subsets are decoded. The minimum subset that can thus be decoded is the first bitstream in the set which is called the base layer. Each of the other bitstreams in the set is called an enhancement layer. When addressing a specific enhancement layer, “lower layer” refer to the bitstream which precedes the enhancement layer.

3.117 side information: Information in the bitstream necessary for controlling the decoder.

3.118 16x8 prediction: A prediction mode similar to field-based prediction but where the predicted block size is 16x8 luminance samples.

3.119 run: The number of zero coefficients preceding a non-zero coefficient, in the scan order. The absolute value of the non-zero coefficient is called “level”.

3.120 saturation: Limiting a value that exceeds a defined range by setting its value to the maximum or minimum of the range as appropriate.

3.121 skipped macroblock: A macroblock for which no data is encoded.

3.122 slice: A consecutive series of macroblocks which are all located in the same horizontal row of macroblocks.

3.123 SNR scalability: A type of scalability where the enhancement layer (s) contain only coded refinement data for the DCT coefficients of the lower layer.

3.124 source; input: Term used to describe the video material or some of its attributes before encoding.

3.125 spatial prediction: prediction derived from a decoded frame of the lower layer decoder used in spatial scalability

3.126 spatial scalability: A type of scalability where an enhancement layer also uses predictions from sample data derived from a lower layer without using motion vectors. The layers can have different frame sizes, frame rates or chrominance formats

3.127 start codes [system and video]: 32-bit codes embedded in that coded bitstream that are unique. They are used for several purposes including identifying some of the structures in the coding syntax.

3.128 stuffing (bits); stuffing (bytes): Code-words that may be inserted into the coded bitstream that are discarded in the decoding process. Their purpose is to increase the bitrate of the stream which would otherwise be lower than the desired bitrate.

3.129 temporal prediction: prediction derived from reference frames or fields other than those defined as spatial prediction

3.130 temporal scalability: A type of scalability where an enhancement layer also uses predictions from sample data derived from a lower layer using motion vectors. The layers have identical frame size, and chrominance formats, but can have different frame rates.

3.131 top field: One of two fields that comprise a frame. Each line of a top field is spatially located immediately above the corresponding line of the bottom field.

3.132 top layer: the topmost layer (with the highest layer_id) of a scalable hierarchy

3.133 variable bitrate: Operation where the bitrate varies with time during the decoding of a coded bitstream.

3.134 variable length coding; VLC: A reversible procedure for coding that assigns shorter code-words to frequent events and longer code-words to less frequent events.

3.135 video buffering verifier; VBV: A hypothetical decoder that is conceptually connected to the output of the encoder. Its purpose is to provide a constraint on the variability of the data rate that an encoder or editing process may produce.

3.136 video sequence: The highest syntactic structure of coded video bitstreams. It contains a series of one or more coded frames.

3.137 xxx profile decoder: decoder able to decode one or a scalable hierarchy of bitstreams of which the top layer conforms to the specifications of the xxx profile (with xxx being any of the defined Profile names).

3.138 xxx profile scalable hierarchy: set of bitstreams of which the top layer conforms to the specifications of the xxx profile.

3.139 xxx profile bitstream: a bitstream of a scalable hierarchy with a profile indication corresponding to xxx. Note that this bitstream is only decodable together with all its lower layer bitstreams (unless it is a base layer bitstream).

3.140 zigzag scanning order: A specific sequential ordering of the DCT coefficients from (approximately) the lowest spatial frequency to the highest.

1 2 3 4 5 6 7 8 9 ... 35