Contents Page

səhifə	13/35
tarix	25.06.2016
ölçüsü	2.59 Mb.

1 ... 9 10 11 12 13 14 15 16 ... 35

7.7 Spatial scalability

This clause specifies the additional decoding process required for the spatial scalable extensions.

Both the lower layer and the enhancement layer shall use the “restricted slice structure” (no gaps between slices).

Figure 7-13 is a diagram of the video decoding process with spatial scalability The diagram is simplified for clarity.

Figure 7-13. Simplified motion compensation process for spatial scalability

7.7.1 Higher syntactic structures

In general the base layer of a spatial scalable hierarchy can conform to any coding standard including Recommendation ITU T H.261, ISO/IEC11172-2 this specification. Note however, that within this specification the decodability of a spatial scalable hierarchy is only considered in the case that the base layer conforms to this specification or ISO/IEC11172-2.

Due to the “loose coupling” of layers only one syntactic restriction is needed in the enhancement layer if both lower and enhancement layer are interlaced. In that case picture_structure has to take the same value as in the reference frame used for prediction from the lower layer. See 7.7.3.1 for how to identify this reference frame.

7.7.2 Prediction in the enhancement layer

A motion compensated temporal prediction is made from reference frames in the enhancement layer as described in 7.6. In addition, a spatial prediction is formed from the lower layer decoded frame (d_lower[y][x]), as described in 7.7.3. These predictions are selected individually or combined to form the actual prediction.

In general up to four separate predictions are formed for each macroblock which are combined together to form the final prediction macroblock p[y][x].

In the case that a macroblock is not coded, either because the entire macroblock is skipped or the specific macroblock is not coded there is no coefficient data. In this case f[y][x] is zero and the decoded samples are simply the prediction, p[y][x].

7.7.3 Formation of spatial prediction

Forming the spatial prediction requires identification of the correct reference frame and definition of the spatial resampling process, which is done in the following clauses.

The resampling process is defined for a whole frame, however, for decoding of a macroblock, only the 16x16 region in the upsampled frame, which corresponds to the position of this macroblock, is needed.

7.7.3.1 Selection of reference frame

The spatial prediction is made from the reconstructed frame of the lower layer referenced by the lower_layer_temporal_reference. However, if lower and enhancement layer bitstreams are embedded in an Recommendation ITU T H.220.0 | ISO/IEC 13818-1 (Systems) multiplex, this information is overridden by the timing information given by the decoding time stamps (DTS) in the PES headers.

NOTE - If group_of_pictures_header() occurs often in the lower layer bitstream then the temporal reference in the lower layer may be ambiguous (because temporal_reference is reset after a group_of_pictures_header()).

The reconstructed picture from which the spatial prediction is made shall be one of the following:

• The coincident or most recently decoded lower layer picture

• The coincident or most recently decoded lower layer I-picture or P-picture

• The second most recently decoded lower layer I-picture or P-picture provided that the lower layer does not have low_delay set to ‘1’. Note furthermore that spatial scalability will only work efficiently when predictions are formed from frames in the lower layer which are also coincident (or very close) in display time with the predicted frame in the enhancement layer.

7.7.3.2 Resampling process

The spatial prediction is made by resampling the lower layer reconstructed frame to the same sample grid as the enhancement layer. This grid is defined in terms of frame coordinates, even if a lower-layer interlaced frame was actually coded with a pair of field pictures.

This resampling process is illustrated in Figure 7-14.

Figure 7-14. Formation of the “spatial” prediction by interpolation of the lower layer picture

Spatial predictions shall only be made for macroblocks in the enhancement layer that lie wholly within the upsampled lower layer reconstructed frame.

The upsampling process depends on whether the lower layer reconstructed frame is interlaced or progressive, as indicated by lower_layer_progressive_frame and whether the enhancement layer frame is interlaced or progressive, as indicated by progressive_frame.

When lower_layer_progressive_frame is ‘1’, the lower layer reconstructed frame (renamed to prog_pic) is resampled vertically as described in 7.7.3.4. The resulting frame is considered to be progressive if progressive_frame is ‘1’ and interlaced if progressive_frame is ‘0’. The resulting frame is resampled horizontally as described in 7.7.3.6. lower_layer_deinterlaced_field_select shall have the value ‘1’.

When lower_layer_progressive_frame is ‘0’ and progressive_frame is ‘0’, each lower layer reconstructed field is deinterlaced as described in 7.7.3.4, to produce a progressive field (prog_pic). This field is resampled vertically as described in 7.7.3.5. The resulting field is resampled horizontally as described in 7.7.3.6. Finally the resulting field is subsampled to produce an interlaced field. lower_layer_deinterlaced_field_select shall have the value ‘1’.

When lower_layer_progressive_frame is ‘0’ and progressive_frame is ‘1’, each lower layer reconstructed field is deinterlaced as described in 7.7.3.4, to produce a progressive field (prog_pic). Only one of these fields is required. When lower_layer_deinterlaced_field_select is ‘0’ the top field is used, otherwise the bottom field is used. The one that is used is resampled vertically as described in 7.7.3.5. The resulting frame is resampled horizontally as described in 7.7.3.6.

For interlaced frames, if the current (and implicitly the lower-layer) frame are encoded as field pictures, the deinterlacing process described in 7.7.3.5 is done within the field.

lower_layer_vertical_offset and lower_layer_horizontal_offset, defining the position of the lower layer frame within the current frame, shall be taken into account in the resampling definitions in 7.7.3.5 and 7.7.3.6 respectively. The lower layer offsets are limited to even values when the chrominance in the enhancement layer is subsampled in that dimension in order to align the chrominance samples between the two layers.

The upsampling process is summarised Table 7-15.

Table 7-15 Upsampling process

lower_layer_

deinterlaced_

field_select

lower_layer_

progressive_frame

progressive_

frame

Apply

deinterlace

process

Entity used

for prediction

0

0

1

yes

top field

1

0

1

yes

bottom field

1

0

0

yes

both fields

7.7.3.3 Colour component processing

Due to the different sampling grids of luminance and chrominance components, some variables used in 7.7.3.4 to 7.7.3.6 take different values for luminance and chrominance resampling. Furthermore it is permissible for the chrominance formats in the lower layer and the enhancement layer to be different from one another.

The table 7-16 defines the values for the variables used in 7.7.3.4 to 7.7.3.6

Table 7-16 Local variables used in 7.7.3.3 to 7.7.3.5

variable	value for luminance processing	value for chrominance processing
ll_h_size	lower_layer_prediction_horizontal_size	lower_layer_prediction_horizontal_size / chroma_ratio_horizontal[lower]
ll_v_size	lower_layer_prediction_vertical_size	lower_layer_prediction_vertical_size / chroma_ratio_vertical[lower]
v_subs_n	vertical_subsampling_factor_n	vertical_subsampling_factor_n * format_ratio_vertical

Tables 7-17 and 7-18 give additional definitions.

Table 7-17 chrominance subsampling ratios for layer = {lower, enhance}

chrominance format lower layer	chroma_ratio_ horizontal[layer]	chroma_ratio_ vertical[layer]
4:2:0	2	2
4:2:2	2	1
4:4:4	1	1

Table 7-18 chrominance format ratios

chrominance format lower layer	chrominance format enhancement layer	format_ratio_ horizontal	format_ratio_ vertical
4:2:0	4:2:0	1	1
4:2:0	4:2:2	1	2
4:4:4	4:4:4	1	1

7.7.3.4 Deinterlacing

If deinterlacing needs not to be done (according to table 7-16), the lower layer reconstructed frame (d_lower[y][x]) is renamed to input_pic.

First, each lower layer field is padded with zeros to form a progressive grid at a frame rate equal to the field rate of the lower layer, and with the same number of lines and samples per line as the lower layer frame. Table 7-19 specifies the filters to be applied next. The luminance component is filtered using the relevant two field aperture filter if picture_structure == “Frame-Picture” or else using the one field aperture filter . The chrominance component is filtered using the one field aperture filter.

The temporal and vertical columns of the table indicate the relative spatial and temporal coordinates of the samples to which the filter taps defined in the other two columns apply. An intermediate sum is formed by adding the multiplied coefficients together.

Table 7-19. Deinterlacing Filter

two field aperture

one field aperture

Temporal

Vertical

Filter for first field

Filter for second field

Filter (both fields)

-1

-2

0

-1

0

-1

0

0

2

0

1

+2

-1

0

0

The output of the filter (sum) is then scaled according to the following formula:

prog_pic[y][x] = sum // 16

and saturated to lie in the range [0:255].

The filter aperture can extend outside the coded picture size. In this case the samples of the lines outside the active picture shall take the value of the closest neighbouring existing sample (below or above) of the same field as defined below.

For all samples [y][x]:

if (y<0 && (y&1 == 1))

y=1

if (y<0 && (y&1 == 0))

y=0

if (y >= ll_v_size &&

( (y-ll_v_size)&1 == 1))

y = ll_v_size - 1

if (y >= ll_v_size &&

((y-ll_v_size)&1 == 0))

y = ll_v_size - 2

7.7.3.5 Vertical resampling

The frame subject to vertical resampling, prog_pic, is resampled to the enhancement layer vertical sampling grid using linear interpolation between the sample sites according to the following formula, where vert_pic is the resulting field:

vert_pic[y_h+ ll_v_offset][x] = (16 - phase) * prog_pic[y1][x] + phase * prog_pic[y2][x]

where y_h+ ll_v_offset = output sample coordinate in vert_pic

y1 = (y_{h *}v_subs_m) / v_subs_n

y2 = y1 + 1 if y1 < ll_v_size - 1

y1 otherwise

phase = (16 * (( y_h * v_subs_m) % v_subs_n)) // v_subs_n

Samples which lie outside the lower layer reconstructed frame which are required for upsampling are obtained by border extension of the lower layer reconstructed frame.

NOTE - The calculation of phase assumes that the sample position in the enhancement layer at y_h = 0 is spatially coincident with the first sample position of the lower layer. It is recognised that this is an approximation for the chrominance component if the chroma_format == 4:2:0.

7.7.3.6 Horizontal resampling

The frame subject to horizontal resampling, vert_pic, is resampled to the enhancement layer horizontal sampling grid using linear interpolation between the sample sites according to the following formula, where hor_pic is the resulting field:

hor_pic[y][x_h+ ll_h_offset] = ((16 - phase) * vert_pic[y][x1] + phase * vert_pic[y][x2]) // 256

where x_h+ ll_h_offset = output sample coordinate in hor_pic

x1 = (x_h* h_subs_m) / h_subs_n

x2 = x1 + 1 if x1 < ll_h_size - 1

x1 otherwise

phase = (16 * (( x_h * h_subs_m) % h_subs_n)) // h_subs_n

Samples which lie outside the lower layer reconstructed frame which are required for upsampling are obtained by border extension of the lower layer reconstructed frame.

7.7.3.7 Reinterlacing

If reinterlacing needs not to be done, the result of the resampling process, hor_pic, is renamed to spat_pred_pic.

If hor_pic was derived from the top field of a lower layer interlaced frame, the even lines of hor_pic are copied to the even lines of spat_pred_pic.

If hor_pic was derived from the bottom field of a lower layer interlaced frame the odd lines of hor_pic are copied to the odd lines of spat_pred_pic.

If hor_pic was derived from a lower layer progressive frame, hor_pic is copied to spat_pred_pic.

7.7.4 Selection and combination of spatial and temporal predictions

The spatial and temporal predictions can be selected or combined to form the actual prediction. The macroblock_type (Tables B-5, B-6 and B-7) ) and the additional spatial_temporal_weight_code (Table 7-21) indicate, by use of the spatial_temporal_weight_class, whether the prediction is temporal-only, spatial-only or a weighted combination of temporal and spatial predictions. Classes are defined in the following way:

Class 0 indicates temporal-only prediction

Class 1 indicates that neither field has spatial-only prediction

Class 2 indicates that the top field is spatial-only prediction

Class 3 indicates that the bottom field is spatial-only prediction

Class 4 indicates spatial-only prediction

In intra pictures, if spatial_temporal_weight_class is 0, normal intra coding is performed, otherwise the prediction is spatial-only. In predicted and interpolated pictures, if the spatial_temporal_weight_class is 0, prediction is temporal-only, if the spatial_temporal_weight_class is 4, prediction is spatial-only, otherwise one or a pair of prediction weights is used to combine the spatial and temporal predictions.

The possible spatial_temporal_weights are given in a weight table which is selected in the picture spatial scalable extension. Up to four different weight tables are available for use depending on whether the current and lower layers are interlaced or progressive, as indicated in Table 7-20 (allowed, yet not recommended values given in brackets).

Table 7-20. Intended (allowed) spatial_temporal_weight_code_table_index values

Lower layer format	Enhancement layer format	spatial_temporal_weight_ code_table_index
Progressive or interlaced	Progressive	00
Progressive coincident with enhancement layer top fields	Interlaced	10 (00; 01; 11)
Interlaced (picture_structure != Frame-Picture)	Interlaced	00

In macroblock_modes(), a two bit code, spatial_temporal_weight_code, is used to describe the prediction for each field (or frame), as shown in the Table 7-21. In this table spatial_temporal_integer_weight identifies those spatial_temporal_weight_codes that can also be used with dual prime prediction (see tables 7-22, 7-23).

Table 7-21 spatial_temporal_weights and spatial_temporal_weight_classes for the spatial_temporal_weight_code_table_index and spatial_temporal_weight_codes

spatial_temporal_ weight_code_table_ index	spatial_ temporal_ weight_code	spatial_ temporal_ weight (s)	spatial_ temporal_ weight class	spatial_ temporal_ integer_weight
00*	-	(0,5)	1	0
01	00	(0; 1)	3	1
	01	(0; 0,5)	1	0
	11	(0,5; 0,5)	1	0
10	00	(1; 0)	2	1
	01	(0,5; 0)	1	0
	11	(0,5; 0,5)	1	0
11	00	(1; 0)	2	1
	01	(1; 0,5)	2	0
	11	(0,5; 0,5)	1	0
* For spatial_temporal_weight_code_table_index == 00 no spatial_temporal_weight_code is transmitted.

NOTE - Spatial-only prediction (weight_class == 4) is signalled by different values of macroblock_type (see tables B-5 to B-7).

When the spatial_temporal_weight combination is given in the form (a; b), “a” gives the proportion of the prediction for the top field which is derived from the spatial prediction and “b” gives the proportion of the prediction for the bottom field which is derived from the spatial prediction for that field.

When the spatial_temporal_weight is given in the form (a), “a” gives the proportion of the prediction for the picture which is derived from the spatial prediction for that picture.

The precise method for predictor calculation is as follows:

pel_pred_temp[y][x] is used to denote the temporal prediction (formed within the enhancement layer) as defined for pel_pred[y][x] in 7.6. pel_pred_spat[y][x] is used to denote the prediction formed from the lower layer by extracting the appropriate samples, co-located with the current macroblock position, from spat_pred_pic.

If the spatial_temporal_weight is zero then no prediction is made from the lower layer. Therefore;

pel_pred[y][x] = pel_pred_temp[y][x];

If the spatial_temporal_weight is one then no prediction is made from the enhancement layer. Therefore;

pel_pred[y][x] = pel_pred_spat[y][x];

If the weight is one half then the prediction is the average of the temporal and spatial predictions. Therefore;

pel_pred[y][x] = (pel_pred_temp[y][x] + pel_pred_spat[y][x])//2;

When progressive_frame == 0 chrominance is treated as interlaced, that is, the first weight is used for the top field chrominance lines and the second weight is used for the bottom field chrominance lines.

Addition of prediction and coefficient data is then done as in 7.6.8.

7.7.5 Updating motion vector predictors and motion vector selection

In frame pictures where field prediction is used the possibility exists that one of the fields is predicted using spatial-only prediction. In this case no motion vector is present in the bitstream for the field which has spatial-only prediction. For the case where both fields of a frame have spatial-only prediction, the macroblock_type is such that no motion vectors are present in the bitstream for that macroblock.

The spatial_temporal_weight_class also indicates the number of motion vectors which are present in the coded bitstream and how the motion vector predictors are updated as defined in Table 7-22 and Table 7 23.

Table 7-22. Updating of motion vector predictors in Field Pictures

frame_motion_type
	macroblock_motion_forward
		macroblock_motion_backward
			macroblock_intra
				spatial_temporal_weight_class
					Predictors to update
Field-based^‡	-	-	1	0	PMV[1][0][1:0] = PMV[0][0][1:0]^◊
Field-based	1	1	0	0	PMV[1][0][1:0] = PMV[0][0][1:0]
					PMV[1][1][1:0] = PMV[0][1][1:0]
Field-based	1	0	0	0,1	PMV[1][0][1:0] = PMV[0][0][1:0]
Dual prime	1	0	0	0	PMV[1][0][1:0] = PMV[0][0][1:0]
NOTE - PMV[r][s][1:0] = PMV[u][v][1:0] means that; PMV[r][s][1] = PMV[u][v][1] and PMV[r][s][0] = PMV[u][v][0] ^◊ If concealment_motion_vectors is zero then PMV[r][s][t] is set to zero (for all r, s and t). ^‡ field_motion_type is not present in the bitstream but is assumed to be Field-based ^§ PMV[r][s][t] is set to zero (for all r, s and t). See 7.6.3.4.

Table 7-23. Updating of motion vector predictors in Frame Pictures

frame_motion_type
	macroblock_motion_forward
		macroblock_motion_backward
			macroblock_intra
				spatial_temporal_weight_class
					Predictors to update
Frame-based^‡	-	-	1	0	PMV[1][0][1:0] = PMV[0][0][1:0]^◊
Frame-based	1	1	0	0	PMV[1][0][1:0] = PMV[0][0][1:0]
					PMV[1][1][1:0] = PMV[0][1][1:0]
Frame-based	1	0	0	0,1,2,3	PMV[1][0][1:0] = PMV[0][0][1:0]
Dual prime@	1	0	0	0,2,3	PMV[1][0][1:0] = PMV[0][0][1:0]
NOTE - PMV[r][s][1:0] = PMV[u][v][1:0] means that; PMV[r][s][1] = PMV[u][v][1] and PMV[r][s][0] = PMV[u][v][0] ^◊ If concealment_motion_vectors is zero then PMV[r][s][t] is set to zero (for all r, s and t). ^‡ frame_motion_type is not present in the bitstream but is assumed to be Frame-based ^§ PMV[r][s][t] is set to zero (for all r, s and t). See 7.6.3.4. ^@ Dual prime can not be used when spatial_temporal_integer_weight = ‘0’.

7.7.5.1 Resetting motion vector predictors

In addition to the cases identified in 7.6.3.4 the motion vector predictors shall be reset in the following cases;

• In a P-picture when a macroblock is purely spatially predicted (spatial_temporal_weight_class == 4)

• In a B-picture when a macroblock is purely spatially predicted (spatial_temporal_weight_class == 4)

NOTE - In case of spatial_temporal_weight_class == 2 in a frame picture when field-based prediction is used, the transmitted vector is applied for the bottom field (see Table 7-25). However this vector[0][s][1:0] is predicted from PMV[0][s][1:0] . PMV[1][s][1:0] is then updated as shown in Table 7-23.

Table 7-24. Predictions and motion vectors in field pictures

field_motion_type
	macroblock_motion_forward
		macroblock_motion_backward
			macroblock_intra
				spatial_temporal_weight_class
					Motion vector	Prediction formed for
Field-based^‡	-	-	1	0	vector'[0][0][1:0]^◊	None (motion vector is for concealment)
Field-based	1	1	0	0	vector'[0][0][1:0]	whole field, forward
					vector'[0][1][1:0]	whole field, backward
Field-based	1	0	0	0,1	vector'[0][0][1:0]	whole field, forward
16x8 MC	1	1	0	0	vector'[0][0][1:0]	upper 16x8 field, forward
					vector'[1][0][1:0]	lower 16x8 field, forward
					vector'[1][1][1:0]	lower 16x8 field, backward
16x8 MC	1	0	0	0,1	vector'[0][0][1:0]	upper 16x8 field, forward
					vector'[1][0][1:0]	lower 16x8 field, forward
16x8 MC	0	1	0	0,1	vector'[0][1][1:0]	upper 16x8 field, backward
					vector'[1][1][1:0]	lower 16x8 field, backward
Dual prime	1	0	0	0	vector'[0][0][1:0]	whole field, same parity, forward
					vector'[2][0][1:0]^*†	whole field, opposite parity, forward
NOTE - Motion vectors are listed in the order they appear in the bitstream ^◊ the motion vector is only present if concealment_motion_vectors is one ^‡ field_motion_type is not present in the bitstream but is assumed to be Field-based ^* These motion vectors are not present in the bitstream ^† These motion vectors are derived from vector’[0][0][1:0] as described in 7.6.3.6 ^§ The motion vector is taken to be (0; 0) as explained in 7.6.3.5

Table 7-25. Predictions and motion vectors in frame pictures

frame_motion_type
	macroblock_motion_forward
		macroblock_motion_backward
			macroblock_intra
				spatial_temporal_weight_class
					Motion vector	Prediction formed for
Frame-based^‡	-	-	1	0	vector'[0][0][1:0]^◊	None (motion vector is for concealment)
Frame-based	1	1	0	0	vector'[0][0][1:0]	frame, forward
					vector'[0][1][1:0]	frame, backward
Frame-based	1	0	0	0,1,2,3	vector'[0][0][1:0]	frame, forward
Field-based	1	1	0	0	vector'[0][0][1:0]	top field, forward
					vector'[1][0][1:0]	bottom field, forward
					vector'[1][1][1:0]	bottom field, backward
Field-based	1	0	0	0,1	vector'[0][0][1:0]	top field, forward
					vector'[1][0][1:0]	bottom field, forward
Field-based	1	0	0	2		top field, spatial
					vector'[0][0][1:0]	bottom field, forward
Field-based	1	0	0	3	vector'[0][0][1:0]	top field, forward
						bottom field, spatial
Field-based	0	1	0	0,1	vector'[0][1][1:0]	top field, backward
					vector'[1][1][1:0]	bottom field, backward
Field-based	0	1	0	2		top field, spatial
					vector'[0][1][1:0]	bottom field, backward
Field-based	0	1	0	3	vector'[0][1][1:0]	top field, backward
						bottom field, spatial
Dual prime^@	1	0	0	0,2,3	vector'[0][0][1:0]	top field, same parity, forward
					vector'[0][0][1:0]^*	bottom field, same parity, forward
					vector'[3][0][1:0]^*†	bottom fld., opposite parity, forward
NOTE - Motion vectors are listed in the order they appear in the bitstream ^◊ the motion vector is only present if concealment_motion_vectors is one ^‡ frame_motion_type is not present in the bitstream but is assumed to be Frame-based ^* These motion vectors are not present in the bitstream ^† These motion vectors are derived from vector’[0][0][1:0] as described in 7.6.3.6 ^§ The motion vector is taken to be (0; 0) as explained in 7.6.3.5 ^@ Dual prime can not be used when spatial_temporal_integer_weight = ‘0’.

7.7.6 Skipped macroblocks

In all cases, a skipped macroblock is the result of a prediction only, and all the DCT coefficients are considered to be zero.

If sequence_scalable_extension is present and scalable_mode = “spatial scalability”, the following rules apply in addition to those given in 7.6.6.

In I-pictures, skipped macroblocks are allowed. These are defined as spatial-only predicted.

In P-pictures and B-pictures, the skipped macroblock is temporal-only predicted.

In B-pictures a skipped macroblock shall not follow a spatial-only predicted macroblock.

7.7.7 VBV buffer underflow in the lower layer

In the case of spatial scalability, VBV buffer underflow in the lower layer may cause problems. This is because of possible uncertainty in precisely which frames will be repeated by a particular decoder.

1 ... 9 10 11 12 13 14 15 16 ... 35