Content Adaptive Encoding: Key Decisions for an Effective Solution
by Jay Shingala (Senior Principal Engineer, Media Server Technologies) & Prashanth Dixit (Principal Engineer, Media Server Technologies)
Content Adaptive Encoding drives significant bitrate savings. Here are six questions that will help you gain better understanding of an effective CAE solution.
The Holy Grail of video encoding continues to be an optimal balance between perceptual quality and compression efficiency. The quest for achieving a perfect trade-off between the two factors inspires us to find the most effective way for optimal bit allocation for a given video. While traditional multi-pass encoding with fixed bitrate ladder overspends bits for simpler contents, Content Adaptive Encoding (CAE) solutions are emerging as a promising alternative. By allocating only the required bits for a given video, based on its complexity, CAE drives significant bitrate savings.
A key advantage of CAE solutions is that they work in conjunction with standard compliant encoders and are not replacements to them. Therefore, they provide bandwidth savings without the need to change the supported video formats. However, unlike codecs, CAE is not a defined standard. It can therefore be driven by varying methods based on specific deployment scenarios, presenting a multitude of conflicting choices for users and developers. In this blog, we explore the advantages and challenges with each option, along with the rationale behind some of our choices with THINKode – Ittiam’s Content Adaptive Encoding solution.
Inside the codec? Or outside?
Should CAE algorithms reside within a codec implementation or operate independently outside?
Table 1. provides the pros and cons of both the options, to help us make the best decision.
Outside the codec
Works well with currently deployed (proven) codecs, avoiding the need to replace encoders Provides the flexibility to deploy content analysis on an independent platform/premise
Offers coarse level access to the encoder, thereby limiting potential gains with lower level access to the encoder
Requires integrating additional elements in the workflow
Inside the codec
Enables better control on codec specific parameters and mode decisions to achieve higher bitrate savings
Inflexible – tied to a specific codec implementation
Requires the user to switch out the deployed codecs in the workflow and integrate a new codec
Re-validation and verification of an encoder can be expensive
Table 1: Inside the codec or outside? Pros & cons
Although the algorithms for THINKode are designed to work in both modes, the first version of our CAE solution is implemented to be independent of the codec. This is because our customers prefer reusing their existing encoder implementation, and consider a codec-independent solution the most viable deployment option.
Granularity of video content for analysis and processing
The next key question is at what level of content granularity we apply Content Adaptive Encoding. In this case, our choice isn’t binary, but lies on a scale with two extremes –per frame/block and per title – with several possible intermediate units such as per scene and per segment in between (as shown in figure 2).
Let us compare per title against per scene as representations of the two extremes, outside of the codec (table 2).
Works better than fixed ABR ladders Simpler to implement as a workflow
Poorer compression efficiency due to overspending of bits for simpler scenes within a title
Inconsistent quality across scenes in the title with inconsistent results in bitrate savings
Superior bitrate savings due to finer analysis Consistent quality level at finer granularity
Maintaining similar quality levels at scene boundaries
Table 2: Per title VS per scene
THINKode uses a per scene unit for better granularity in assessment. The algorithms have additional intelligence to monitor distribution of bits across scenes in an efficient manner, thus ensuring consistent quality across scenes while achieving higher bit-rate savings.
Iterative or non-iterative?
Most of the existing CAE solutions in the industry employ iterative methods, which involve encoding a given unit of video (title, frame or segment) using multiple settings. This is followed by closed loop quality evaluation to determine the minimum bit allocation that meets the target quality level. Although they deliver optimal results, iterative methods significantly increase the processing overhead, thus slowing down the encoding process and increasing the processing cost beyond viable limits. This can be a deal-breaker for live encoding and cost sensitive VoD OTT applications.
To counter this challenge, THINKode uses an advanced machine learning (ML) based non-iterative method. Fundamentally, a non-iterative method helps reduce the processing overhead by arriving at a ‘one-shot’ encoder setting through ML. At SMPTE 2017, we had presented a detailed paper on our non-iterative method, and the results show that ML based THINKode delivers bandwidth gains similar to those of iterative methods – without all the processing overheads.
Which quality metric?
The effectiveness of any Content Adaptive Encoding solution depends heavily on the quality metric it uses. This implies that innovations in CAE go hand in hand with innovations in perceptual quality measurement.
The best known traditional SNR based metrics (like PSNR) are unsuitable for CAE due to poor correlation of distortion with human perception. Hence CAE has to rely on more modern metrics that have a stronger correlation with human visual perception – like VMAF from Netflix, PSNR-HVSM, or STRRED. A key challenge with these metrics is their inadequate level of maturity and the resulting inconsistencies.
Although proprietary metrics may yield better bit-rate gains, lack of consensus in the industry on the validity of those metrics are likely to lead to poorer acceptance of CAE solutions. Hence, for THINKode, we rely on VMAF – the fastest evolving perceptual quality metric with the highest level of industry acceptance. Further, additional intelligence in the ML algorithms helps adapt and moderate the inconsistencies exhibited by VMAF in areas such as spatial pooling and high grain noise (more details @ our blog).
Specific to a codec format? Or codec format agnostic?
Another important question before us is whether we tie the solution to a specific encoding format such as H.264 or let it be independent of the compression standard. This becomes pertinent given that multiple formats are expected to be in concurrent operation. Table 3 lists out the advantages and challenges of these options.
Higher flexibility to adapt CAE for multiple codecs – in rapid time Effective reuse of CAE technology for consistent results across formats
Addressing various RC modes such as VBR, CBR and CRF that are often unique to each encoder
Effective retraining of ML algorithms for each codec standard
Potentially higher gains with deeper leverage of codec specific characteristics to achieve CAE
Longer lead time to adapt to newer formats, and multiple integration cycles
Inconsistent results across codec formats due to changing algorithms
Table 3: Pros & Cons of codec agnostic and codec specific CAE solutions
The first version of THINKode has been validated with the H.264 format, and the decision was driven by the popularity of codec implementations such as x264 in OTT VoD/Live solutions. However, as a codec-independent solution based on machine learning techniques, THINKode is designed to be easily retrained for any new format including HEVC, VP9 and the upcoming AV1 (stay tuned for updates with support for these formats).
ABR Resolutions – fixed or dynamic?
Most OTT service providers operate with a fixed ABR profile set optimized for various target devices and networks. Since an ABR encoded stream is viewed on a screen with a fixed resolution, we observe that as bit-rates drop for a given resolution, a lower resolution representation can start looking better than the higher resolution representation at the same bit-rate. This can trade off sharpness for other artifacts like blockiness. The graph below (figure 3) demonstrates the fluctuations in quality at various bit-rate points across resolutions.
As seen in the graph, the dynamic resolution selection normally helps when operating at lower bitrate ranges. However, for certain types of contents such as high grain noise and fast motion, it may help to save more bits even when operating at higher bitrate ranges. Table 4 shows the bit-rate savings we obtained for representative clips across four different content types using fixed and dynamic resolution at two different operating bitrates.
High grain noise
Table 4: Bit-rate savings for various content types
The key challenges with including such a dynamic ABR encoding scheme within a Content Adaptive Encoding solution is the increase in solution complexity and processing cost. However, advanced ML algorithms enable effective gains within THINKode with very minimal processing overhead.
Content Adaptive Encoding: The Road Ahead
CAE addresses the ever growing need to compress video beyond the limits of classical encoding methods. An effective solution needs to carefully choose from some of the options discussed above to maximize bitrate savings – without affecting perceived quality. At the same time, the solution needs to be practical in terms of complexity and ease of deployment for different use cases.
With the rapid pace of advancements in AI and machine learning, evolving video quality metrics, newer generation of video codecs such as VP9, HEVC and AV1, CAE solutions will continue to evolve in coming years. Watch out our blog for frequent updates on the evolving trends in Content Adaptive Encoding and continuous improvements in Ittiam’s THINKode solution.