Achieving low latency video streaming over a network with constrained bandwidth while maintaining high video quality involves multiple challenges. It can however be accomplished by understanding the end application requirements and making the right trade-offs accordingly. Let us understand how!
For a network with limited bandwidth (ex: wireless networks), Constant Bit Rate (CBR) transmission at a data rate below the available bandwidth is often a critical requirement. Video encoders with a CBR output can help ensure that video streaming does not cause bandwidth spikes on the network. However, trying to generate a CBR encoded bit stream can impact the achievable video quality.
- Some portions of video are more complex spatially (textures/edges) and/or temporally (high motion). These portions often require a higher number of bits to encode. However, CBR constraints may not allow for that many bits to be allocated to these portions, hence reducing the achieved quality.
- For portions of video that are low complexity, some of the allocated bits may not be utilized for encoding. Therefore, to maintain CBR
additional padding is done in such cases.
This brings us to Variable Bit Rate (VBR) encoding where the under-utilized bits from low-complexity portions can be allocated for encoding high complexity portions in the video. This would achieve an overall better quality across all video frames since the allocation of bits is proportional to the complexity of video frames.
- For VBR encoding, the instantaneous number of bits allocated to a video frame is unconstrained. Therefore, the network bandwidth required to transmit such video fluctuates a lot. Spikes in the data rate may result in loss of video packets over the network, higher network jitter and higher latency of transmission.
Network traffic shaping is used to eliminate such spikes in the bit rate from VBR streaming. What this means is that the multiple data packets for a typical compressed video frame are spread out temporally for transmission (rather than bursting them all to the network instantly).
To do so, additional queueing of packets is required before transmission which leads to increased latency in delivering video.
In summary, achieving a low video streaming latency while aiming for high video quality at a given bandwidth is not straight forward. One option to achieve this is to implement Constrained Variable Bit Rate (CVBR) encoding as a good trade-off between latency and quality. This is similar to VBR encoding, except that the peak encoded bit rate for any portion of video is constrained within an upperbound (based on available network bandwidth), therefore keeping the spikes in data rate under check (ex: a peak to average data rate ratio of ~2). So, the encoder can vary its bit rate to accommodate the change in video complexity while still maintaining the data rate.
Other than data rate control, some encoding features/parameters can enhance the user experience:
- Sub-frame processing can be used to minimize latency by parallelizing various video processing operations (capture, pre-process, encode, packetize & stream). Often, these sub-frames are independently decodable and that brings in another advantage in case of delivering video over a loss-prone network (ex: wireless) since the decoder can still decode the rest of the frame even when a part of the frame is lost. Although sub-frame encoding reduces the encoding efficiency (achieved video quality at a given bitrate), the impact is fairly low when weighed in context of the latency and error-resilience it offers.
- GOP size indicates the number of coded video frames between two intra-coded frames (I-Frames). Since I-Frames typically consume more bits than P-Frames for a similar video quality, encoding frequent I-frame impacts the achievable video quality. And since encoded I-frames have a higher instantaneous data rate, they increase spikes on network or latency.
- Fewer I-frames can save bits which can be allocated for P-frame encoding with an improved quality. Increasing the I-Frame interval can keep the network bandwidth consumption more uniform. This is often recommended for wireless networks.
- For decoders (players) to latch on video streams with fewer I-frames, other techniques like on-request I-frame generation can be used.
- Adaptive Intra Refresh means that instead of generating I-Frames at a regular interval, all frames are encoded as P-frames. In each P-frame however, a small subset of macro blocks is intra-coded such that all macro blocks get intra-coded over successive P-frames. What this does is distribute the bits required for encoding one I-Frame over multiple P-frames, thus removing the instantaneous data rate spikes. This is often recommended for loss-prone and bandwidth-constrained networks (ex: wireless).
- In such a case, P-Frame size will vary depending on the complexity of the video frame. To contain the maximum size of an encoded P-frame, encoders can be configured to limit the maximum encoded frame size.
- Instead of over compressing a high resolution video, it can be scaled down to a lower resolution before encoding. For some applications, this can reduce the data rate even though scaling before encoding may have its own impact on perceived quality and latency. Alternatively, in the cases where the entire field of view is not important, one can choose to crop and encode the Region of Interest (RoI) i.e.,the pixels of higher importance (typically at the center). This helps in achieving the similar advantage as image scaling. A similar benefit can be achieved by down conversion of frame rate. In cases where the spatial video quality is of higher importance than the temporal video quality or latency, this might be an acceptable strategy.
Ittiam’s adroitSDK Media SDKs implement these schemes for multiple video networking use cases (Encoder/Server, Transcoder, etc.), achieving the best trade-offs across a wide range of industrial, surveillance, enterprise, medical & defense applications.