H.265 Video Codec: What It Means for FPGA Designs

By Jong Kim | Mar 7, 2014

In early 2013 the next generation video codec standard was approved by the Joint Collaborative Team on Video Coding (JCT-VC). The standard is called High Efficiency Video Codec (HEVC), more commonly referred to as H.265.

H.265 boasts many improvements over the previous standard, H.264, namely that it is able to maintain the same image quality at half the bitrate. A better architecture and compression algorithm allow service providers to stream videos with much better resolution using the same bandwidth.

The new standard also requires far more processing power, which will have major implications for FPGA-based video designs.

Differences between H.264 and H.265:

 

H265_Video_Codec_Table.png
 

Architecturally, H.265 provides much greater flexibility for parallel encode/decode processing. The new standard adds a “tile” (an independently decodable region) layer on top of the “slice” layer, a modified CABAC context switching mechanism, and support for Waterfront Parallel Processing. CABAC was a challenging function for parallel processing, as it requires bitwise operation. In the H.265 standard, the parallelizing the CABAC processing is considered in the architectural layer. On top of the CABAC parallelizing, all other encoding/decoding functions can be parallelized by the Waterfront processing strategy. The waterfront processing enables the multiple macroblocks processing when it’s neighbor macroblocks are ready.

This enhanced parallel processing capability allows Ultra High Definition (UHD) 4K and 8K videos to be supported. With H.265, UHD encoding/decoding can be processed by multiple cores of HD class encoding/decoding IP.

So what does this mean for FPGA Designs?

To manage the complexity of H.265, FPGA devices can be used as co-processors or accelerators to achieve a real-time encoder/decoder system. Major FPGA vendors such as Altera and Xilinx currently offer very powerful SoC devices (Altera’s Arria V and Cylone V series, and Xilinx’s Zynq-7000). These SoC platforms could be the best approaches for development, prototyping, and production, as they provide both flexibility and performance. For simple calculations, the 4K encoding process can be split into four parallel processing pipelines, with two SoC chips performing the processing in parallel.

The newly introduced SoC devices have dual ARM processors and FPGA fabric so that motion estimation, motion compensation, and inter prediction blocks can be implemented in the FPGA fabric. Integer DCT can be implemented in the FPGA DSP area, and Syntax assemble and Entropy coding can be handled by the ARM core. As the complexity of video codec algorithm increases these types of SoC FPGAs will be the best solution.