基于H.264的视频编码处理技术与应用(贾克斌[等]编著)思维导图
- 格式:xmin
- 大小:6.34 KB
- 文档页数:1
视频压缩编码标准H.264详解——新疆大学2006级工硕郭新军JVT(Joint Video Team,视频联合工作组)于2001年12月在泰国Pattaya 成立。
它由ITU-T和ISO两个国际标准化组织的有关视频编码的专家联合组成。
JVT的工作目标是制定一个新的视频编码标准,以实现视频的高压缩比、高图像质量、良好的网络适应性等目标。
目前JVT的工作已被ITU-T接纳,新的视频压缩编码标准称为H.264标准,该标准也被ISO接纳,称为AVC(Advanced Video Coding)标准,是MPEG-4的第10部分。
H.264标准可分为三档:基本档次(其简单版本,应用面广);主要档次(采用了多项提高图像质量和增加压缩比的技术措施,可用于SDTV、HDTV和DVD等);扩展档次(可用于各种网络的视频流传输)。
H.264不仅比H.263和MPEG-4节约了50%的码率,而且对网络传输具有更好的支持功能。
它引入了面向IP包的编码机制,有利于网络中的分组传输,支持网络中视频的流媒体传输。
H.264具有较强的抗误码特性,可适应丢包率高、干扰严重的无线信道中的视频传输。
H.264支持不同网络资源下的分级编码传输,从而获得平稳的图像质量。
H.264能适应于不同网络中的视频传输,网络亲和性好。
一、H.264视频压缩系统H.264标准压缩系统由视频编码层(VCL)和网络提取层(Network Abstraction Layer,NAL)两部分组成。
VCL中包括VCL编码器与VCL解码器,主要功能是视频数据压缩编码和解码,它包括运动补偿、变换编码、熵编码等压缩单元。
NAL则用于为VCL提供一个与网络无关的统一接口,它负责对视频数据进行封装打包后使其在网络中传送,它采用统一的数据格式,包括单个字节的包头信息、多个字节的视频数据与组帧、逻辑信道信令、定时信息、序列结束信号等。
包头中包含存储标志和类型标志。
存储标志用于指示当前数据不属于被参考的帧。
8th International Conference on Management, Education and Information (MEICI 2018)HD Video Coding based on Fractal and H.264Yun ChenBeihang University, Beijing, ChinaKeywords: High-definition video; Video coding; Fractal coding; H.264Abstract. This paper studies the HD video and remote sensing image compression algorithm based on fractal and H.264.In order to study the technology of image storage and transmission suitable for HD resolution video. The system further optimizes the intra-frame coding mode currently adopted by H.264, and proposes a new fast intra-block block algorithm to reduce the prediction mode selection. For inter prediction, a new motion estimation and motion compensation algorithm is proposed based on fractal. In order to further reduce the number of coded bits, transform, quantize and entropy encode the fractal residuals. Since the motion vector MV and the fractal coefficients cannot be quantized, their differences are directly entropy encoded. The high-definition video coding system based on fractal and H.264 not only reduces the coding complexity, but also improves the coding efficiency.IntroductionWith the continuous advancement of technology and the improvement of video technology, the demand for high quality video is increasing [1-3]. However, how to improve compression efficiency in the case of limited transmission and storage space has become a key to video technology [4-9]. In order to better adapt to the requirements of high-definition video compression, H.264 makes full use of the spatial correlation of video frames to propose a tree-block partitioning mode and multiple prediction modes. However, due to multiple prediction modes in different division modes, the coding time is relatively long, therefore, we first perform a statistic on the different block partitions in the intra prediction of H.264 to facilitate a subsequent related improvement on intra prediction.In current fractal video coding systems, intraframe coding of I frames is performed, the processing method: First, the I frame is divided into fixed-size blocks, and then each block is subjected to a series of operations such as transformation, quantization, and entropy coding. The code stream formed by this method is encoded, and the decoded video image quality formed after decoding, inverse quantization, and inverse transform at the decoding end is worse than the original image, and the decoded image quality is poor. Because the H.264 encoding effect is relatively better, in order to improve the quality of intraframe coding, this paper draws on the intra prediction method in the video coding standard H.264/A VC and improves it to some extent [10-16].Block Division StructureIntra block partition. In this paper, the video sequence to be encoded is divided into consecutive frames, each frame is divided into 16×16 macroblocks of uniform size, and then encoded for each macroblock [18]. First, the macroblock is divided into 16×16, and then the prediction modes of the four prediction modes are respectively performed, and the rate distortion cost in different prediction modes is calculated; Then divide the current macroblock into 16 4×4 blocks, perform prediction of nine prediction modes for each block and perform rate distortion cost calculation for each prediction mode, and then calculate the sum of rate distortion costs of the entire 16×16 block; The prediction mode with the lowest rate distortion cost is found in the 16×16 and 4×4 division modes as the final division mode and prediction mode [19]. This paper proposes a fast intra block partitioning algorithm.(1) The left macroblock A is 4×4, and the upper macroblock B is also 4×4,Calculating the rate distortion cost J of various prediction modes in the two block division modes for the current block; (2) The left macroblock A is 4×4 partition, the upper macroblock B is 16×16 partition, or the left macroblock A is 16×16 partition, and the upper macroblock B is 4×4 partition. Two block partitioning attempts are made to the current block and the rate distortion cost J is calculated for each mode.(3) The left macroblock A is 16×16 partition, the upper macroblock B is 16×16 partition, and the current block is 16×16 partition, so that each macroblock can reduce the 144 (16×9) rate distortion cost calculation.The calculation process of the rate distortion cost is as follows: (1) Calculate the prediction residual of the current block.(,)(,)(,)Diff x y Orig x y Pred x y =- (1) Among them, (,)Orig x y and (,)Pred x y re the pixel values of the original block and the prediction block respectively.(2) The Hadamard transform is performed on the prediction residual, and the absolute value and SA TD after the residual transformation are calculated, that is,((,))x ySATD HAD Diff x y =∑ (2) (3) The SA TD cost is calculated using a simplified SA TD cost function.mod mod e e J SATD R λ=+⋅ (3) Among them,mode λ is a Lagrange multiplier ,mode R is the bit value encoded in the current intra mode. Fractal codingFast motion estimation algorithm. Inter-frame prediction generally uses the time-domain correlation of video frames, using motion estimation and motion compensation to find the best matching block [20]. Since the amount of full search calculation in the search range space is too large, which is not conducive to improving coding efficiency, we propose a new fast motion estimation algorithm. The motion vectors of adjacent blocks are utilized to calculate the initial search points of the current block, then the fast motion estimation algorithm is carried out based on the initial search points, the best matching block is calculated by fractal coding, finally, the residual is transformed by integer DCT, quantized and entropy coding [21, 22]. In the interframe predictive coding mode, each block will have one that needs to be encoded. When a small-sized blocking mode is used, for example, if a macroblock is completely divided into 4×4 blocks, 16 encodings are required; An MV contains four variables, motion vectors x, y, s, and o, x and y are motion vectors, s and o are fractal parameters. If the MV is not compressed, the number of encoded bits of the MV may be larger than the prediction residual. The MV of the current block is predicted by the correlation between adjacent blocks MV and the difference between the predicted p MV and the actual MV is encoded:p MVD MV MV =- (4)As shown in Figure 1, it is predicted by the left A, the upper B and the upper right block C (arbitrary size) of the current block. Suppose E is the current block, and A is the block to which the uppermost 4×4 neighbor block belongs to the left side of E.B is the block to which the leftmost 4×4 neighboring block above E belongs, and C is the block to which the diagonal upper 4×4 neighboring block belongs to the upper right corner of E.The prediction rules are as follows:(1) In addition to the 16x8 and 8x16 blocks, the p MV takes the median of the A, B, and C block MV.(2) In the 16×8 block, the upper part of the p MV is predicted by B, and the part p MV below the MV is predicted by A.(3) In the 8×16 block, the left part p MV is predicted by A, and the right part p MV is predicted by B.(4) When A, B, and C are not available, they are directly encoded.(5) When C is not available, if D is available in the upper left corner, replace it with D.。
H.264码流结构解析1.H.264简介MPEG(Moving Picture Experts Group)和VCEG(Video Coding Experts Group)已经联合开发了一个比早期研发的MPEG 和H.263性能更好的视频压缩编码标准,这就是被命名为A VC(Advanced Video Coding),也被称为ITU-T H.264建议和MPEG-4的第10 部分的标准,简称为H.264/A VC或H.264。
这个国际标准已经与2003年3月正式被ITU-T所通过并在国际上正式颁布。
为适应高清视频压缩的需求,2004年又增加了FRExt部分;为适应不同码率及质量的需求,2006年又增加了可伸缩编码SVC。
2.H.264编码格式H.263定义的码流结构是分级结构,共四层。
自上而下分别为:图像层(picturelayer)、块组层(GOB layer)、宏块层(macroblock layer)和块层(block layer)。
而与H.263相比,H.264的码流结构和H.263的有很大的区别,它采用的不再是严格的分级结构。
H.264支持4:2:0的连续或隔行视频的编码和解码。
H.264压缩与H.263、MPEG-4相比,视频压缩比提高了一倍。
H.264的功能分为两层:视频编码层(VCL, Video Coding Layer)和网络提取层(NAL, Network Abstraction Layer)。
VCL数据即编码处理的输出,它表示被压缩编码后的视频数据序列。
在VCL数据传输或存储之前,这些编码的VCL数据,先被映射或封装进NAL单元中。
每个NAL单元包括一个原始字节序列负荷(RBSP, Raw Byte Sequence Payload)、一组对应于视频编码的NAL头信息。
RBSP的基本结构是:在原始编码数据的后若干比特0”,以便字节对齐。
面填加了结尾比特。
一个bit“1”“图1 NAL单元序列3.H.264传输H.264的编码视频序列包括一系列的NAL单元,每个NAL单元包含一个RBSP,见表1。
H.264详解为什么叫H.264H.264是一种视频高压缩技术,全称是MPEG-4 A VC,用中文说是“活动图像专家组-4的高等视频编码”,或称为MPEG-4 Part10。
它是由国际电信标准化部门ITU-T和规定MPEG的国际标准化组织ISO/国际电工协会IEC共同制订的一种活动图像编码方式的国际标准格式,这是我们叫惯了的MPEG中的一种,那为什么叫H.264呢?原来国际电信标准化部门从1998年就H.26L的H.26S两个分组,前者研制节目时间较长的高压缩编码技术,后者则指短节目标准制订部门。
H.26S 的标准化技术的名称为H.263,听起来很耳生,但实质上却早在用了,还被骂得很激烈。
因为,H.263先入为大,一直以MPEG-4大内涵的名字在用。
H.263的全称为MPEG-4 Visual或MPEG-4 Pall Ⅱ,即MPEG-4视频简单层面的基础编码方式。
2001年后,国际电信标准化部门ITU-T和MPEG的上级组织国际标准化组织ISO/国际电气标准会议IEC成立了联合视频组JVT,在H.26L基础进行H.264的标准化。
2002年12月9日~13日,在日本香川县淡路岛举行的MPEG聚会上确定了相关技术的规格。
规格书定稿后,2003年3月17日,H.364的技术格式最终稿国际标准规格(FDIS)被确立。
目前软件和LSI芯片,服务及设备也都进入了使用阶段。
格式书中,列出了比特流规定,解码必要格式,和可供参考的编码记载。
为了不引起误解,ITU-T推荐使用H.264作为这一标准的正式名称。
实际上,MPEG-4里还有MPEG-4 Audio和MPEG-4 System的不同规格。
MPEG-4挨骂是因为MPEG-4 Visual许可收费离谱引起的。
别以为有了专利就可以随意向人要钱了,专利的最终目的的是使全社会的智力资料更合理地使用,防止重复劳动,并不是犒赏最先发明者。
按唯美史观,当社会技术发展到某一阶段时,新技术必然会出现。