July 25, 2017

Driving Faster Content Preparation with Scalable Chunked Encoding

by Karthikeyan N (Principal Engineer, Media Server Technologies)

Key factors that enable scalability

Why chunked encoding?

One of the biggest challenges with on-demand video is delivering the most optimal content for distribution. While multiple ABR representations, higher resolutions like 4K, higher frame rates and newer compression formats like HEVC and VP9 contribute to improvements in user experience, the encoding process gets increasingly complex and hence slow. This puts content preparation workflows under a lot of pressure to deliver content within the stipulated turnaround time. With chunked encoding, we can effectively address this concern.

Scale of increase in complexity

  • 1080p30 to 4Kp60 – 8x
  • H.264 to HEVC/VP9 – 3 to5x
  • Adding all ABR profiles – 2.5 to5x

While the process of encoding a single file in 1080p30 typically requires one hour, it can now extend up to several days with all the above changes!

Since the distribution costs far exceed the encoding costs, it makes business sense to add more compute resources to ensure faster availability of content to subscribers. We can then reduce the turnaround time by fragmenting the source content into multiple chunks, and independently encoding them in parallel across multiple compute resources. This is exactly what THINKode, Ittiam’s machine learning based content adaptive encoder, enables. However, a key parameter that defines the success of our chunked encoding scheme is ensuring scalability over a large number of resources.

Chunk size Vs video quality trade-off

Fragmenting the source content limits the amount of frames an encoder can use during encoding. This presents a trade-off between the chunk size and the quality of encoding. Although our objective is to minimize the chunk size to increase opportunities for parallel processing, the following factors force a lower limit on the chunk size:

  • Typical encoding algorithms will not be efficient for chunks that are smaller than a certain threshold due to restricted look ahead processing, resulting in reduced quality.
  • Key characteristics of the input video such as encoding GOP size and any scene-cuts will also limit the minimum chunk sizes we can use.

To determine the minimum chunk size that delivers acceptable quality, we must understand the typical GOP sizes and scene transitions over a large content set. We also need to experiment with various chunk sizes to assess the overall quality degradation.

Chunk size Vs performance trade-off

Even if we choose a very small chunk size that ensures zero quality degradation, we need to consider several performance factors that limit efficient parallelization. Furthermore, smaller chunk sizes lead to higher system management overheads. The chunk size therefore needs to be balanced with the overall turnaround time achievable, using the available storage infrastructure and compute resources. Take a look at how we can do so based on three main performance factors:

1. File storage access speed

While we attempt to parallelize encoding across multiple compute resources, the source files typically reside on a single file storage system. The question here is, ‘does the download speed of each chunk match the processing speed of the encoding process task while downloading multiple chunks of source file in parallel?’ If the answer is yes, then the encoding process task does not starve of input data.

Even with high speed, distributed public cloud storage services like AmazonS3, the download speed does not scale linearly with the number of parallel downloads beyond a certain point. And if the content provider has hosted the content on private servers accessed through an external network, the download speeds could be even lower and inconsistent.

To reduce wastage of cloud compute resources, we must adopt an optimal chunk size dynamically based on the file storage type and actual download speed available.

2. Shared filesystem access speed

In a chunked encoding system, a shared filesystem may be needed to store metadata needed across encoding tasks. It also stores the intermediate media/metadata output from each chunk process task required for the stitcher to generate the final output.

Amazon provides Elastic File System (EFS) for use as shared filesystem. The throughput of EFS depends on the size of data stored in the EFS at that point of time. Take a look at the EFS throughput for sample file system sizes under steady state operation:

Now, let us consider a typical ABR encoding recipe with 10 representations:

  • ~10GB shared filesystem will be required for one hour of content, if the output is generated at a total bitrate of ~2.5MBps
  • 1080p24 input content can be encoded at quality equivalent to x264 two pass encoding at a very slow preset at a speed of ~1.82fps
  • The output will thus be generated at {(2.5MBps * 1.82fps) / 24fps} = 0.189MBps
  • As output is generated for each chunk, the filesystem grows and the throughput increases, allowing us to
  • process a maximum of three chunks in parallel towards the end of the encoding.
  • In addition to this, the stitcher can read at an average of only 0.5MBps, once all the chunks are processed.

Though the throughput can be increased by storing some persistent data in EFS, this adds to the overall cost of encoding. Scalability will thus depend on how well we design the encoding workflow to minimize the usage of shared filesystem.

3. Instance management

The time required to dynamically acquire cloud compute instances and the associated billing cycle, which typically differs across cloud providers, also influence scalability. If the objective of scalability includes optimal utilization of acquired resources in addition to reducing the turnaround time of encoding, we also have to ensure efficient scheduling of chunked encoding tasks across the acquired resources.

For example, Amazon charges for a minimum billing cycle of one hour, in which case, it is cost effective to make use of the complete billing cycle. Instead of acquiring six resources that perform a 10 minute encoding task each, we can acquire a single resource and schedule the six tasks sequentially – at the cost of increased turnaround time for those tasks.

Therefore, the best approach is to choose the optimum number of chunks for processing in parallel based on a perfect tradeoff between cost and overall speed gain across all compute resources. If the billing cycle or cost is not a constraint, then consider the time it takes to launch an instance to determine the optimum number of chunks for parallel processing.

Gear up for speed

The impact of the above factors becomes manifold when multiple contents are encoded in parallel. However, it is possible to effectively address these challenges by replicating what we have done with THINKode software. This involves implementing schemes that monitor the critical factors and automatically identify the optimum number of chunks to be processed in parallel for efficient use of cloud instances.

Reach out to us for more information @mkt@www.ittiam.com