Content Adaptive Encoding:
by Jay Shingala (Senior Principal Engineer, Media Server Technologies)
Key Decisions for an Effective Solution
& Prashanth Dixit (Principal Engineer, Media Server Technologies)
6 questions that significantly improve our understanding of a CAE solution
The Holy Grail of video encoding continues to be an optimal balance between perceptual quality and compression efficiency. The quest for achieving a perfect trade-off between the two factors inspires us to find the most effective way for optimal bit allocation for a given video. While traditional multi-pass encoding with fixed bitrate ladder overspends bits for simpler contents, Content Adaptive Encoding (CAE) solutions are emerging as a promising alternative. By allocating only the required bits for a given video, based on its complexity, CAE drives significant bit-rate savings.
A key advantage of CAE solutions is that they work in conjunction with standard compliant encoders and are not replacements to them. Therefore, they provide bandwidth savings without the need to change the supported video formats. However, unlike codecs, CAE is not a defined standard. It can therefore be driven by varying methods based on specific deployment scenarios, presenting a multitude of conflicting choices for users and developers. In this blog, we explore the advantages and challenges with each option, along with the rationale behind some of our choices with THINKode – Ittiam’s CAE solution.
Should CAE algorithms reside within a codec implementation or operate independently outside?
Table 1. provides the pros and cons of both the options, to help us make the best decision.
|Outside the codec||Works well with currently deployed (proven) codecs, avoiding the need to replace encoders|
Provides the flexibility to deploy content analysis on an independent platform/premise
|Offers coarse level access to the encoder, thereby limiting potential gains with lower level access to the encoder|
Requires integrating additional elements in the workflow
|Inside the codec||Enables better control on codec specific parameters and mode decisions to achieve higher bitrate savings||Inflexible – tied to a specific codec implementation|
Requires the user to switch out the deployed codecs in the workflow and integrate a new codec
Re-validation and verification of an encoder can be expensive
Although the algorithms for THINKode are designed to work in both modes, the first version of our CAE solution is implemented to be independent of the codec. This is because our customers prefer reusing their existing encoder implementation, and consider a codec-independent solution the most viable deployment option.
The next key question is at what level of content granularity we apply CAE. In this case, our choice isn’t binary, but lies on a scale with two extremes –per frame/block and per title – with several possible intermediate units such as per scene and per segment in between (as shown in figure 2).
Let us compare per title against per scene as representations of the two extremes, outside of the codec (table 2).
|Per Title||Works better than fixed ABR ladders|
Simpler to implement as a workflow
|Poorer compression efficiency due to overspending of bits for simpler scenes within a title |
Inconsistent quality across scenes in the title with inconsistent results in bitrate savings
|Per Scene||Superior bitrate savings due to finer analysis|
Consistent quality level at finer granularity
|Maintaining similar quality levels at scene boundaries|
THINKode uses a per scene unit for better granularity in assessment. The algorithms have additional intelligence to monitor distribution of bits across scenes in an efficient manner, thus ensuring consistent quality across scenes while achieving higher bit-rate savings.
Most of the existing CAE solutions in the industry employ iterative methods, which involve encoding a given unit of video (title, frame or segment) using multiple settings. This is followed by closed loop quality evaluation to determine the minimum bit allocation that meets the target quality level. Although they deliver optimal results, iterative methods significantly increase the processing overhead, thus slowing down the encoding process and increasing the processing cost beyond viable limits. This can be a deal-breaker for live encoding and cost sensitive VoD OTT applications.
To counter this challenge, THINKode uses an advanced machine learning (ML) based non-iterative method. Fundamentally, a non-iterative method helps reduce the processing overhead by arriving at a ‘one-shot’ encoder setting through ML. At SMPTE 2017, we had presented a detailed paper on our non-iterative method, and the results show that ML based THINKode delivers bandwidth gains similar to those of iterative methods – without all the processing overheads.
The effectiveness of any CAE solution depends heavily on the quality metric it uses. This implies that innovations in CAE go hand in hand with innovations in perceptual quality measurement.
The best known traditional SNR based metrics (like PSNR) are unsuitable for CAE due to poor correlation of distortion with human perception. Hence CAE has to rely on more modern metrics that have a stronger correlation with human visual perception – like VMAF from Netflix, PSNR-HVSM, or STRRED. A key challenge with these metrics is their inadequate level of maturity and the resulting inconsistencies.
Although proprietary metrics may yield better bit-rate gains, lack of consensus in the industry on the validity of those metrics are likely to lead to poorer acceptance of CAE solutions. Hence, for THINKode, we rely on VMAF – the fastest evolving perceptual quality metric with the highest level of industry acceptance. Further, additional intelligence in the ML algorithms helps adapt and moderate the inconsistencies exhibited by VMAF in areas such as spatial pooling and high grain noise (more details @ our blog).
Another important question before us is whether we tie the solution to a specific encoding format such as H.264 or let it be independent of the compression standard. This becomes pertinent given that multiple formats are expected to be in concurrent operation. Table 3 lists out the advantages and challenges of these options.
|Codec Agnostic||Higher flexibility to adapt CAE for multiple codecs – in rapid time|
Effective reuse of CAE technology for consistent results across formats
|Addressing various RC modes such as VBR, CBR and CRF that are often unique to each encoder|
Effective retraining of ML algorithms for each codec standard
|Codec Specific||Potentially higher gains with deeper leverage of codec specific characteristics to achieve CAE||Longer lead time to adapt to newer formats, and multiple integration cycles|
Inconsistent results across codec formats due to changing algorithms
The first version of THINKode has been validated with the H.264 format, and the decision was driven by the popularity of codec implementations such as x264 in OTT VoD/Live solutions. However, as a codec-independent solution based on machine learning techniques, THINKode is designed to be easily retrained for any new format including HEVC, VP9 and the upcoming AV1 (stay tuned for updates with support for these formats).
Most OTT service providers operate with a fixed ABR profile set optimized for various target devices and networks. Since an ABR encoded stream is viewed on a screen with a fixed resolution, we observe that as bit-rates drop for a given resolution, a lower resolution representation can start looking better than the higher resolution representation at the same bit-rate. This can trade off sharpness for other artifacts like blockiness. The graph below (figure 3) demonstrates the fluctuations in quality at various bit-rate points across resolutions.
As seen in the graph, the dynamic resolution selection normally helps when operating at lower bitrate ranges. However, for certain types of contents such as high grain noise and fast motion, it may help to save more bits even when operating at higher bitrate ranges. Table 4 shows the bit-rate savings we obtained for representative clips across four different content types using fixed and dynamic resolution at two different operating bitrates.
|Content Type||1080p@9 Mbpsemail@example.com Mbps|
|High grain noise||7.18%||21.28%||5.71%||25.82%|
The key challenges with including such a dynamic ABR encoding scheme within a CAE solution is the increase in solution complexity and processing cost. However, advanced ML algorithms enable effective gains within THINKode with very minimal processing overhead.
CAE addresses the ever growing need to compress video beyond the limits of classical encoding methods. An effective solution needs to carefully choose from some of the options discussed above to maximize bitrate savings – without affecting perceived quality. At the same time, the solution needs to be practical in terms of complexity and ease of deployment for different use cases.
With the rapid pace of advancements in AI and machine learning, evolving video quality metrics, newer generation of video codecs such as VP9, HEVC and AV1, CAE solutions will continue to evolve in coming years. Watch out our blog for frequent updates on the evolving trends in CAE and continuous improvements in Ittiam’s THINKode solution.