Driving Faster Content Preparation with Scalable Chunked Encoding
by Karthikeyan N (Principal Engineer, Media Server Technologies)
Key factors that enable scalability
One of the biggest challenges with on-demand video is delivering the most optimal content for distribution. While multiple ABR representations, higher resolutions like 4K, higher frame rates and newer compression formats like HEVC and VP9 contribute to improvements in user experience, the encoding process gets increasingly complex and hence slow. This puts content preparation workflows under a lot of pressure to deliver content within the stipulated turnaround time. With chunked encoding, we can effectively address this concern.
Scale of increase in complexity
- 1080p30 to 4Kp60 – 8x
- H.264 to HEVC/VP9 – 3 to5x
- Adding all ABR profiles – 2.5 to5x
While the process of encoding a single file in 1080p30 typically requires one hour, it can now extend up to several days with all the above changes!
Since the distribution costs far exceed the encoding costs, it makes business sense to add more compute resources to ensure faster availability of content to subscribers. We can then reduce the turnaround time by fragmenting the source content into multiple chunks, and independently encoding them in parallel across multiple compute resources. This is exactly what THINKode, Ittiam’s machine learning based content adaptive encoder, enables. However, a key parameter that defines the success of our chunked encoding scheme is ensuring scalability over a large number of resources.
Fragmenting the source content limits the amount of frames an encoder can use during encoding. This presents a trade-off between the chunk size and the quality of encoding. Although our objective is to minimize the chunk size to increase opportunities for parallel processing, the following factors force a lower limit on the chunk size:
To determine the minimum chunk size that delivers acceptable quality, we must understand the typical GOP sizes and scene transitions over a large content set. We also need to experiment with various chunk sizes to assess the overall quality degradation.
Even if we choose a very small chunk size that ensures zero quality degradation, we need to consider several performance factors that limit efficient parallelization. Furthermore, smaller chunk sizes lead to higher system management overheads. The chunk size therefore needs to be balanced with the overall turnaround time achievable, using the available storage infrastructure and compute resources. Take a look at how we can do so based on three main performance factors:
While we attempt to parallelize encoding across multiple compute resources, the source files typically reside on a single file storage system. The question here is, ‘does the download speed of each chunk match the processing speed of the encoding process task while downloading multiple chunks of source file in parallel?’ If the answer is yes, then the encoding process task does not starve of input data.
Even with high speed, distributed public cloud storage services like AmazonS3, the download speed does not scale linearly with the number of parallel downloads beyond a certain point. And if the content provider has hosted the content on private servers accessed through an external network, the download speeds could be even lower and inconsistent.
To reduce wastage of cloud compute resources, we must adopt an optimal chunk size dynamically based on the file storage type and actual download speed available.
In a chunked encoding system, a shared filesystem may be needed to store metadata needed across encoding tasks. It also stores the intermediate media/metadata output from each chunk process task required for the stitcher to generate the final output.
Amazon provides Elastic File System (EFS) for use as shared filesystem. The throughput of EFS depends on the size of data stored in the EFS at that point of time. Take a look at the EFS throughput for sample file system sizes under steady state operation:
|Filesystem Size (GB)||Baseline Aggregate Throughput (MBps)|
Now, let us consider a typical ABR encoding recipe with 10 representations:
Though the throughput can be increased by storing some persistent data in EFS, this adds to the overall cost of encoding. Scalability will thus depend on how well we design the encoding workflow to minimize the usage of shared filesystem.
The time required to dynamically acquire cloud compute instances and the associated billing cycle, which typically differs across cloud providers, also influence scalability. If the objective of scalability includes optimal utilization of acquired resources in addition to reducing the turnaround time of encoding, we also have to ensure efficient scheduling of chunked encoding tasks across the acquired resources.
For example, Amazon charges for a minimum billing cycle of one hour, in which case, it is cost effective to make use of the complete billing cycle. Instead of acquiring six resources that perform a 10 minute encoding task each, we can acquire a single resource and schedule the six tasks sequentially – at the cost of increased turnaround time for those tasks.
Therefore, the best approach is to choose the optimum number of chunks for processing in parallel based on a perfect tradeoff between cost and overall speed gain across all compute resources. If the billing cycle or cost is not a constraint, then consider the time it takes to launch an instance to determine the optimum number of chunks for parallel processing.
The impact of the above factors becomes manifold when multiple contents are encoded in parallel. However, it is possible to effectively address these challenges by replicating what we have done with THINKode software. This involves implementing schemes that monitor the critical factors and automatically identify the optimum number of chunks to be processed in parallel for efficient use of cloud instances.