当前位置：文档之家› Mode-Adaptive Fine Granularity Scalability

Mode-Adaptive Fine Granularity Scalability

Mode-Adaptive Fine Granularity Scalability Wen-Shiaw Peng and Yen-Kuang Chen

Intel Corporation

ABSTRACT

In this paper, we propose a new algorithm, which utilizes the enhancement layer prediction to further improve the coding efficiency of current Fine Granularity Scalability (FGS) defined in MPEG-4. The proposed algorithm adaptively uses (1) the previously reconstructed enhancement layer macroblock after motion compensation along with (2) current reconstructed base layer macroblock and (3) the combination of both to form the predicted marcoblock for current enhancement layer. The new algorithm is designed so that other error drifting reduction methods can be used to avoid drifting problem due to prediction from the enhancement layer. In addition, the proposed algorithm can re-use the implemented B-frame hardware to form the enhancement layer predicted frame. Simulation results show that about 1dB gain in PSNR can be achieved while comparing to the current FGS algorithm at moderate to high bit rates. Thus, the proposed algorithm is a cost efficient solution to improve the coding efficiency of fine granularity scalability.

1.INTRODUCTION

Different from the requirements of video compression in the past, the requirement of video compression nowadays is not only to provide good coding efficiency but also the scalability. The scalability mentioned here means that the compressed bitstream can still be decoded with reasonable quality after the bitstream is extracted or truncated. In general, the scalability could be SNR (Signal to Noise Ratio) scalability, spatial scalability and temporal scalability. All of these scalabilities can be merged into hybrid form. For video SNR scalability, the fine granularity scalability (FGS) [1] defined in MPEG-4 provides a mean to achieve this. The current FGS codes the video into two layers, the base and enhancement layers. The enhancement layer simply takes the quantization noise produced by base layer encoder as input. Basically, the correlation from the enhancement layer is not utilized. Thus, the coding efficiency of current FGS algorithm can still be improved especially when the base layer is coded at low bit rate [7]. Since FGS forms all predictions from the base layer, the error-drifting problem can be avoided. However, this decreases the coding efficiency since all the predictions are from the less correlated base layer pictures. In this paper, we propose a new algorithm that can use the state-of-art methods for error drifting reduction and further improve the coding efficiency.

For SNR scalability of video compression, the simulcast, which directly compressed the video into multiple bitstreams with distinct bit rates, is one of the most institutive ways to achieve the goal. However, simulcast cannot provide smooth quality variation. It is also not efficient from the compression point of view since the correlation among different bitstreams with different bitrates for the same video content is not utilized. To conquer this, progressive fine granularity scalability is proposed [4]. In that proposal [4], a good error-drift reduction method, advance prediction bitplane coding (APBIC), is discussed. The key idea of APBIC is that the reference frame for prediction can be different from that for display. APBIC can also be used with the algorithm we are proposing to reduce the error-drifting problem. In [4], however, they use the difference of two prediction residue frames formed by prediction from the base layer frame and the previous enhancement layer frame as the input of current enhancement layer. The proposed algorithm uses the prediction residue frame as the enhancement layer. Both [4] and our proposed algorithm utilize the correlations among enhancement layers. The innovation and unique features of the new algorithm are as follows:

(1) We use the reconstructed enhancement layer in the spatial

domain for prediction.

(2) We adaptively form the predicted picture for

enhancement layer from the previous enhancement layer

and the base layer.

(3) We have unified implementation for enhancement layer

prediction of P-frame, B-frame and macroblocks in the

intra mode.

Other algorithms were recently proposed to utilize the correlation among enhancement layers [5][6]. Although the algorithm published in [6] is similar, this proposed algorithm differs significantly in the following important characteristics:

a. Our algorithm uses a post-clipping structure, which

means all the prediction operations are performed in the

spatial domain [3].

b.Our algorithm uses 3 prediction modes, one of which

being a combination of the current reconstructed base

layer and the previously reconstructed enhancement

layer after motion compensation.

As mentioned, the proposed algorithm uniquely uses the features listed above to significantly improve the coding efficiency of current FGS. Also, current hardware used for B-frame in base layer can be reconfigured to facilitate the implementation of our algorithm. In Section 2, we describe in details the proposed encoder and decoder architecture and give analysis results for mode decision at macroblock level in Section 3. Simulation results and comparison to the original FGS are given in Section 4. Conclusions are drawn in Section 5.

2.ENCODER AND DECODER ARCHITECTURES

In this section, we describe the proposed encoder and decoder architectures. First, let us look at the prediction scheme in the temporal domain, which is given in Fig 1, so that the encoder and decoder structures can be more easily understood. The previously reconstructed enhancement layer is calculated by keeping 3 bit planes of enhancement layer. As mentioned, the current enhancement layer frame is predicted from the current reconstructed base layer frame and the motion compensated frame of previously reconstructed enhancement layer frame as shown in Fig 1.

In our simulation results, we force the prediction to be always from the base layer for every certain number of frames in order to reduce the effect of error drifting that occurs when only the first enhancement layer is received. The reset operation can be easily done without any extra cost since the original FGS is a subset of the proposed algorithm. As a matter of fact, each reset frame itself is coded using the original FGS structure, i.e., the information represented by the enhancement layer is the quantization noise. Of course, one can apply the techniques such as APBIC [4] on the proposed encoder and decoder to further minimize the error-drifting problem. Having described the relation among frames at the base and enhancement layers in temporal domain, we give the discussions of encoder and decoder structures in the following.

a. Encoder

The proposed encoder structure is depicted in Fig 2. Like the original FGS, we have a base layer encoder and an enhancement layer encoder. The base layer encoder is like the one in FGS. The enhancement layer encoder differs from the one in FGS by using the prediction residue from a predicted frame composed by three different types of macroblocks as input: Type I: Macroblocks from the current reconstructed base

layer,)

(

. This is the same as the current FGS. This mode is useful when a region cannot be motion-predicted from the previous frame like in the presence of occlusions.

To stop the drifting error, we can periodically force all of the macroblocks within the same frame to be from this mode as shown in Fig 1.

Type II: Macroblocks from the reconstructed frame of previous enhancement layer after motion compensation, i.e.,

})

{),1

(

?.(

?. When the base-layer pictures are compressed at a low bit rate, the higher quality references come from the motion compensated enhancement layer frame. Thus, we use this mode to reduce the magnitude of prediction residue and improve the coding efficiency.

Type III: Prediction from the average of the previous enhancement frame and the current reconstructed base

layer, 2/)](?

})

{),1

(

?.(

In intra mode, macroblocks are always predicted from the base layer. In P or B frames, the criteria to chose the prediction mode could be determined by the users. In our simulation results, the minimum sum of absolute differences is used as our criteria. The motion vectors used by both the base and enhancement layer are the same and are obtained by using the original frames for motion estimation. Thus, extra bits for recording additional of enhancement layer are not needed at all. However, additional resources needed while comparing to the original FGS are one more IDCT, one more Frame Buffer and the extra operations needed to determine the prediction mode at the macroblock level. As described before, all the predictions are performed in spatial domain since the post-clipping operation is used. To apply APBIC at frame level, one can additionally add a switch at the path located at the enhancement layer encoder and used to add with the IDCT residue to form the enhancement layer reconstructed frame. The additional path added to the switch is from the base layer.

Fig 2. The encoder Architecture of the proposed algorithm, which adaptively uses enhancement layer for prediction Fig 1. The proposed prediction scheme in temporal

b. Decoder

Fig 3 sketches the decoder architecture. The main difference between the original FGS and the proposed one is the enhancement layer decoder. To reconstruct the enhancement layer frame, the predicted frame composed by three different types of macroblocks as illustrated in the encoder is first reconstructed. As one can observe, the formation of predicted frame for the enhancement layer is similar to the formation of the predicted frame of B-frame at base layer decoder. Thus, the hardware for B-frame at the base layer decoder can actually be re-used to construct the prediction frame for enhancement layers. Like B-frame, we have two reference frames for reconstruction, one from the enhancement layer and the other from the base layer. To re-use the hardware, we should always set the motion vectors of the base layer to be zero. To apply the APBIC, the predicted frame to reconstruct the lower enhancement layer frame should sometimes reset to be the base layer reconstructed frame. The reset behavior should be consistent with that at the encoder side.

c. The B-frame

The basic principle of B frame in the proposed algorithm is the same as P frame. Again, the enhancement layer of current frame is formed by using the prediction residue of two different reconstructed frames. They are the previously reconstructed enhancement layer frame and the current reconstructed base layer frame. In the case of B frame, the previously reconstructed enhancement frame is the predicted frame from the two reference frames composed by the two reconstructed enhancement layer frames. And the reconstructed base layer B-frame is used as the base layer prediction.

As can be seen in the encoder and decoder architectures, the compatibility to the current FGS can be easily obtained by setting the macroblock mode decision to always be Type I. Also, as we mentioned before, error-drift reduction methods such as APBIC can also be easily added. In addition, the hardware implemented for B-frame at base layer can be re-used to produce the predicted frame for enhancement layer. In the following section, we give the analysis result to show why we need mode decision at macroblock level.

3.MODE DECISION ANALYSIS

Table 1: Test condition

# of P frames between two I 59

# of B frames between two P None

Frame Rate 30 frames/s

ME mode

4 MVs Enabled

Frame Prediction Enabled

Field Prediction Disabled

Half Pixels Prediction Enabled

Use Original FRAME as Reference Frame Yes

Num Of BPs used for Enh. Prediction

Y Component 3 BPs

U. V Components 3 BPs

Reset Distance

9 frames

*The other test condition is the same as defined in m3096 [7].

Table 2: Distributions of enhancement layer macro block prediction

modes. (Reset function is enabled and extra enhancement layer

used for prediction are 3)

Akiyo Foreman

Base layer Bit Rate (Kbits/s) 128 256 170256

From

Base Layer

18% 24% 5%7%

From

Enhancement Layer

68% 51% 74%60%

From

Combination of both

14% 25% 21%33%

Table 2 shows the mode decision distribution for two

sequences and two different base layer bit rates. We can see

that the percentage of blocks predicted from the base layer

increases when the base layer bit rate is increased. In other

words, our coding efficiency increases compared to that of

FGS with lower bit rates for the base layer. This is expected

since the prediction from the enhancement layer is not needed

Fig 3. The decoder architecture of the proposed algorithm, which adaptively uses enhancement layer for reconstruction

at all if the base layer bit rate is high enough. As expected, there is significant correlation between the previously reconstructed frame of the enhancement layer and the current frame since predictions from the enhancement layer is used for about 50%~75% of the blocks. Note also that the prediction from the combination of both layers is used for 10%~35% of the cases, which shows this prediction mode can be further used to increase the coding gain.

4. SIMULATION RESULTS

In our simulation results given in Fig 4, we simply truncate the bitstream bitplane by bitplane to test the performance of the proposed algorithm. The same truncation scheme is also applied to the current FGS algorithm to get the rate-distortion curves for comparison. Note that the rate control and the optimization of entropy coder for the proposed algorithm are still not yet implemented. From the simulation results, an average gain of 1dB in PSNR can be obtained at higher bit rates when the base layer is encoded at 128kbits/s. That coding gain is lower than 1dB when the base layer is encoded at 256kbits/s. This is more noticeable for the Akiyo sequence for which the quality at that base layer bit rate is very high (0.3~0.4dB).

The decrease in PSNR at lower bit rates is caused by the error-drifting problem, which happens when we receive only the first bitplane. In this scenario, the decoder can simply discard this bitplane to increase the picture quality. In this case, the rate-distortion curves will stay flat instead of decreasing as currently shown in Fig. 4. Additionally, as we have mentioned, we can also reduce the performance degradation by implementing an error-drift reduction method such as APBIC.

5. CONCLUSIONS

A new algorithm, which combines prediction from base and previous enhancement layer frames to further improve the coding efficiency of FGS, is proposed. The simulation results show that about 1d

B gain in PSNR can be gained by using the proposed algorithm. In addition, the B-frame hardware at the base layer can also be re-used to facilitate the computation of enhancement layer. Also, the software complexity is estimated to increase only about 20%~40%. The new algorithm is suitable for real-time streaming applications. Optimizations such as for rate control, entropy coder and error drifting reduction are still not yet implemented. We expect that the performance of the proposed algorithm will be further improved after these optimizations.

REFERENCES:

[1] “Proposed Draft Amendment 4”,ISO/IEC JTC1/SC29/ WG11,

MPEG00/N3315, Mar 2000.

[2] “FGS Core Experiments,” ISO/IEC JTC1/SC29/WG11,

MPEG99/M3096, Dec1999.

[3] H. Jiang, “Experiments on Using Post-Clip Addition in MPEG-4

FGS Video Coding,” ISO/IEC JTC1/SC29/ WG11, MPEG00/M5742, Mar 2000.

[4] S. Li, F. Wu, and Y.Q. Zhang, “Experimental Results with

Progressive Fine Granularity Scalable (PFGS) Coding,”

ISO/IEC/JTC1/SC29/WG11, MPEG00/M5742, Mar 2000. [5] S. Li, F. Wu, and Y.Q. Zhang, “Study of a new approach to

improve FGS video coding efficiency,” ISO/IEC JTC1/SC29/WG11, MPEG99/M5583, Dec 1999.

[6] F. Wu, S. Li, X. Sun, R. Yan, and Y.Q. Zhang, “Macroblock-based

progressive fine granularity scalable coding,”

ISO/IEC/JTC1/SC29/WG11, MPEG01/M6779, Jan 2001.

[7] W. Li, et al. “Advanced Fine Granularity Scalability for High

Quality Video Distribution,” ISO/IEC JTC1/SC29/ WG11, MPEG01/M6766, Jan 2001.

Acknowledgement: The authors would like to express greatest appreciations to the valuable help from Dr. Andre Zaccarin.

(a) Foreman with base layer at 128kbits/s

(b) Foreman with base layer at 256kbits/s

(d) Akiyo with base layer at 256kbits/s

Fig 4. Simulation results and comparison to the current FGS