December 18, 2020

Power efficient real time VVC decoder

Jeeva Raj A (Principal Engineer, Advanced Video)
Sagar Kotecha (Lead Engineer, Advanced Video)

Video is a powerful medium and its popularity as information, entertainment conduit is on the rise. As per market research, video is responsible for significant share of internet traffic. Additionally the current pandemic situation has significant impact on the exponential increase in online video traffic. Applications like virtual meetings at work, online education for the kids, virtual conferences and importantly OTT has seen massive increase in consumer demand.

Research and development in video compression and related technology ends up with new compression standards in market to address the demands. The recent VVC/H.266 video standard by JVET/MPEG provides significant advantage of compression gains and is versatile for use cases like adaptive bitrate streaming, 3D video, screen content encoding and HDR. Ittiam has been actively participating and contributing to the VVC standardization.

This blog serves as a primer to the various key tools of VVC and their applications. It also provides insights into Ittiam’s efficient implementation of VVC decoder on ARM based mobile client platform.

Versatile Video Coding (Rec. ITU-T H.266 | ISO/IEC 23090-3) [4] is the latest video coding standard finalized in July 2020 from JVET/MPEG. VVC achieves BD-rate gain of ~40% over the predecessor HEVC (Rec. ITU-T H.265 | ISO/IEC 23008-2) for the additional decoding complexity of ~60%.

A Glimpse of Key VVC Tools

What are the new tools that have been introduced or enhanced in VVC? How do these help in getting better compression efficiency?

CategoryToolHigh Level Details
Coding Structure

Max CTU 128x128
128x128

Multi-type Tree

Dual Tree
Dual Tree

Maximum Coding Tree Unit size is 128 to get better compression efficiency in homogeneous regions especially in HD and UHD resolutions

Binary and Ternary Splits are allowed to get further compression gains by coding optimal block sizes

Separate Coding Structure for Chroma - Efficient coding of Chroma blocks independent of Luma coding structure

Intra Prediction

Directional Intra Modes

Wide Angle Prediction

CCLM
CCLM

MRL

ISP
CCLM

MIP
CCLM

Higher number of prediction modes(double as compared to HEVC)
Higher number

Suitable for rectangular block intra prediction
s

Cross Component Linear Model Prediction of Cb from luma and Cr from Cb

Multiple Reference Lines used for Intra prediction

Intra Sub-Partitioning for better prediction and transform to preserve fine details and edges

Matrix Weighted Intra Prediction, tables are arrived through offline modeling/learning depending on block sizes

Inter Prediction

SMVD
SMVD


Extended Merge Prediction


MMVD
MMVD


HMVP
HMVP


Affine


SbTMVP
SbTMVP


AMVR
SbTMVP


BCW
SbTMVP


BDOF
SbTMVP


DMVR
SbTMVP
SbTMVP


GEO


CIIP


Symmetric MVD, one direction MVD in bidirectional prediction blocks is signaled and other direction MVD is derived as inverse of signaled.


Merge list with additional entries from MV history, pair wise MVsMerge list with additional entries from MV history, pair wise MVsMerge list with additional entries from MV history, pair wise MVs


Merge MVD, Additional MVD to Merge MV with signaling of an index to discrete distance and direction for bi


History Based MVP, additional MV into Merge/MVP list based on the unique MVs from the recently coded units


Affine inter prediction for motion model like rotation, shear, zoom


Sub-block Temporal MVP to derive local motion field at 8x8 level within the CU


Adaptive Motion Vector Resolution- allows to signal the MVD precision to get better tradeoff between MV bits and Prediction accuracy.


Bi-Prediction with weights, allows weighted average of list 0 and list 1 Prediction signals instead of the only default 0.5 in HEVC


Bi-directional Optical Flow to get fine motion at 4x4 level based on Optical Flow for merge CUs.


Decoder Side Motion Vector Refinement for merge CUs. Additional MVD are obtained through decoder side refinement without any signaling overhead


Non Rectangular Partition for better prediction


Combined Intra and Inter Prediction


Transform & Quantization

DST7,DCT8

MTS

SBT
SBT

LFNST
SBT

Dependent Quantization

JCCR

For better Energy compaction

Different transform type selection for vertical and horizontal

Sub-Block Transform for Inter CU to save on partitioning where only sub-part of CU residuals are coded.

Low frequency non separable transform for Intra blocks to strive max decorrelation of residual signal

Alternating between 2 scalar quantizer based on state transition rule to select an optimum sequence of reconstruction values

Joint coding of Cb and Cr residuals

Entropy Coding

Core Cabac Engine

Residual Coding

Context Modeling
Context Modeling

Multi-hypothesis Probability update model to improve probability estimation

Separate Residual Coding structure for transform coeff and transform skip coeff, Coeff group size depends on Transform block sizes

Selection of probability models for the syntax elements related to absolute values of transform coefficient levels depends on the values of the absolute levels or partially reconstructed absolute levels in a local neighborhood

Filtering

LMCS
CCLM

CCLM

ALF
ALF

In loop mapping of luma samples based on piecewise linear model

Luma dependent Chroma residual scaling. Scaling of signal across dynamic range

Better compression by removing blur artifacts across due to block processing

Application Specific VVC tools

There are tools in VVC to improve the compression efficiency and flexibility for the real world application like ABR, 360 degree, Screen Content Coding, Video Conferencing.

Special ToolsUse case
RPRReference Picture Resampling - improves compression efficiency for Adaptive Bitrate Streaming coding open GOP with resolution change by upscaling and downscaling reference Pictures
Sub-PicturesIndependently decodable sub-Pictures helps to improve parallelism and suitable for use cases like 360 degree videos and video conferencing
IBC, Transform Skip, Palette, BDPCMIntra Block Copy, Transform Skip and palette mode for Screen Content Coding to improve the compression gains

Demonstration of real-time VVC Playback

A real time decoder is the key enabler for widespread adoption of any video coding technology or Standard. One of Ittiam’s major focus areas is embedded software implementations, optimized for the widely-prevalent ARM based mobile platform.

In the year 2014, Google and Ittiam collaborated to bring HEVC Decoding capability to Android, making it available natively for a vast consumer ecosystem, and at scale [1]. As a step forward for VVC, Ittiam demonstrated the real-time decoding capability of VVC on ARM based mobile devices in JVET and MPEG Plenaries during VVC development and standardization [2], [3]. The main focus of this demonstration was to show the new compression tools in VVC are implementation friendly towards real-time decoding on ARM based platforms.

VVC decoder demo configuration

  • 1920×1080@24fps on with 4 cores of Cortex-A75 clocked at 2.5GHz
  • All tools essential for random access configuration of VVC is included in this implementation
  • 8 bit SIMD optimized
  • Tiles have been used for multi-threaded implementation
Save 20% bitrate with CAE for Live Streaming

Decoding Complexity Comparison of HEVC vs VVC

  • Overall VVC is 1.6x of decoding complexity as compared to HEVC decoding.
  • Post Reconstruction filtering contributes ~43% in VVC against ~13% in HEVC.
  • Inter Prediction contributes ~26% in VVC against ~53% in HEVC.

References