August 26, 2021

Delivering Immersive 3D Audio with Ittiam’s Optimized MPEG-H Decoder

Reshma Rai (General Manger, Consumer Technologies)
Harish S (Senior Manager, Technical Sales)

Reading time: 9 min

Immersive 3D Audio – the next gen audio experience

Imagine watching a scene from an action packed movie on your home theatre system wherein there are helicopters and planes flying overhead as scattered ammunition dumps on the ground explode. You hear and perceive the planes zoom past ‘from above’ as you also hear explosion of ammunition ‘all around you’. This would be a full in-scene immersive audio experience as close as possible to the actual scene and reality. Additionally, what if you could also immerse yourself into a similar aural experience on your other media consumption devices like smartphones, tablets, headphones and VR headsets – in other words, universal delivery? Yes, all this and much more is made possible with Immersive 3D Audio. Let us uncover it here.

Audio user experience has come a long way from the humble mono to classic stereo, followed by surround sound, and now the latest object based Immersive 3D Audio experience. All along this journey, respective audio formats have continuously pushed the technology envelope to its limits as users lapped up the new enticing audio experiences these formats offered. Besides the existence of dominant formats, MPEG-H 3D Audio as the newest entry is currently smallest by market share but holds a lot of promise!

In this article, we will take a closer look at the MPEG-H 3D Audio format – its key features, adoption in various broadcast / streaming standards and product deployments. Also this article will describe way in which Ittiam’s optimized MPEG-H 3D Audio decoder can help toward wider deployment of Immersive 3D Audio in diverse range of consumer devices.

A short primer on MPEG-H 3D Audio

MPEG-H 3D Audio (ISO/IEC 23008-3; MPEG-H Part 3) is an audio compression standard from Moving Picture Experts Group (MPEG) for delivering immersive 3D Audio. MPEG-H 3D Audio uses different audio elements to represent 3D audio space, namely

  • Audio channels
  • Audio objects
  • Scene based content (higher order ambisonics – HoA)
Figure 1: MPEG-H 3D Audio format highlights

Encoding process: An MPEG-H 3D Audio encoded program may consist of flexible combination of audio elements such as audio channels, objects and HoA. A key departure of MPEG-H 3D Audio from earlier channel-based audio standards like MP3 and AAC is that the audio elements do not get mixed at encoding stage for a certain receiver configuration (E.g. stereo, 5.1 channel, etc). Instead, they are encoded separately and streamed to decoder along with metadata (position, gain information and more) related to audio elements. Importantly, this metadata can dynamically change over time.

  • The core codec of the MPEG-H 3D Audio system builds on unified speech and audio coding (USAC) standard and includes many new additional tools.
  • MPEG-H 3D Audio includes tools for loudness and dynamic range control (DRC) derived from MPEG-D DRC.
  • Encoded audio and metadata are encapsulated into MPEG-H 3D audio stream (MHAS) for delivery.
MPEG-H audio decoder
Figure 2: MPEG-H audio decoder

Decoding process: MPEG-H 3D Audio decoder decodes audio channels, audio objects and HoA separately and mixes them together while applying the desired gain(s).

  • The use of audio objects and mixer in MPEG-H 3D Audio decoder, along with provision for user to adjust the position or gain of objects allows for interactivity and personalization of a program by users.
  • Time domain (TD) binaural renderer module produces a binaural downmix of multichannel audio material such that each input channel is represented by a virtual sound source. Binauralization is based on measured binaural room impulse responses (BRIRs).
  • MPEG-H 3D Audio standard can support up to 128 channels and 128 objects mapped up to 64 loudspeakers. However, MPEG-H 3D Audio Low complexity profile level 3 has been defined to limit decoder complexity to manageable levels for consumer devices. This profile is targeted towards TV / consumer audio systems and provides good compromise between audio immersion level and complexity.

The MPEG-H decode process is illustrated in Figure 2.

MPEG-H 3D audio adoption and support in edge devices

MPEG-H has been adopted by Advanced Television Systems Committee in ATSC 3.0 standard as one of the next generation audio (NGA) systems to provide listeners with personalized and immersive audio experience.

  • South Korea took the first lead in ATSC 3.0 deployment and has been airing MPEG-H 3D Audio with UHDTV content since 2017.
  • MPEG-H 3D Audio is also included in SBTVD / ISDB-Tb broadcast standard by the Brazilian digital television forum and has been standardized by DVB & 3GPP forums as well.
  • Inclusion of support for MPEG-H 3D Audio (pass-through and offload mode) in latest AndroidTM 12 promises to unveil more innovative MPEG-H playback use cases in android handhelds and edge devices.
MPEG-H adoption by leading broadcast and mobile Standards
Figure 3: MPEG-H adoption by leading broadcast and mobile Standards

From the above, MPEG-H 3D Audio, an open standard, is gradually and surely gaining momentum to be one of the leading audio format options for delivering next generation immersive audio.

  • Some of the leading TV, soundbar and smart speaker brands currently support MPEG-H 3D Audio in their products and more are expected to roll-out in future.
  • In addition to broadcast, many other segments such as streaming video (VoD / Live stream), music streaming, gaming and virtual reality (VR) are expected to embrace Immersive 3D Audio in a big way.
MPEG-H for wide range of applications
Figure 4: MPEG-H for wide range of applications

Path forward and challenges: Immersive 3D Audio will soon be all pervasive with widespread adoptions across different applications segments (Figure 4). However, a critical hurdle to cross for immersive 3D audio formats to achieve wide deployment in consumer devices (including MPEG-H 3D Audio Low complexity profile) is the need to simultaneously decode multiple audio channels and mix them in real-time. Compared to non-3D Audio codecs, this needs significant processing horsepower (MCPS) from CPUs.

Therefore, there is a clear need for highly optimized MPEG-H 3D Audio decoder implementations to roll-out MPEG-H enabled products with rich audio performance while maintaining high power-efficiency and lower unit-cost.

Ittiam audio codec offerings

  • For close to two decades Ittiam has specialized in delivering highly optimized audio codecs with best-in-class audio performance and low power operation (extended battery life). With deployments in 100 Million+ devices across CE, smart living, wearables, automotive and broadcast / online video market segments, Ittiam’s ready and proven audio IPs enable customers to deliver differentiated, high performance products with shorter time-to-market cycles and at reduced unit cost.
  • Ittiam’s optimized audio codecs for ultra-low power have been licensed to multiple leading wearable brands (Ultra low power codecs for Smart wearables)
  • Ittiam’s HEVC video decoder and Extended HE-AAC audio decoders power AndroidTM AOSP’s multimedia experience (optimized codecs for AndroidTM)

Ittiam’s Audio codecs and Audio IPs are well recognized in the industry for delivering significantly better power performance (lower processor loading and memory footprint) with outstanding audio quality compared to alternatives

Ittiam MPEG-H 3D Audio decoder: Ittiam offers standard compliant, highly optimized MPEG-H 3D Audio software decoder which supports Low complexity profile up to L3. It is ideal for adoption in wide range of consumer devices and applications which are sensitive to processor utilization (MCPS), power consumption and battery drain.

  • Ittiam MPEG-H 3D Audio decoder is readily available for ARM® Cortex®-A CPUs and can be easily made available for other platforms like Intel x86 and popular DSPs
  • Ittiam MPEG-H 3D Audio decoder is ideal for range of broadcast and consumer devices (Figure 5) including TVs, soundbars, handhelds, smart speakers, headphones, gaming consoles, STBs and automotive infotainment head units / rear seat entertainment units.
Ittiam MPEG-H Decoder for broadcast end points and consumer devices
Figure 5: Ittiam MPEG-H Decoder for broadcast end points and consumer devices

Conclusion

Immersive 3D Audio will soon be all-pervasive and is expected to see widespread deployment in both broadcast and consumer audio applications. MPEG-H 3D Audio, which is a feature-packed open standard will grow to be a strong contender among the leading codec options for 3D Audio delivery. With the growing demand from end users for products with rich audio quality, high power efficiency, longer battery operation and lower unit price, incorporating highly optimized and proven MPEG-H 3D Audio codecs will be a key differentiator for new age Immersive 3D Audio enabled products and services. Ittiam’s proven, optimized MPEG-H Decoder can be a key enabler for the unfolding 3D Audio revolution.

Want to know more about Ittiam’s MPEG-H 3D Audio codec? Reach us at  mkt@ittiam.com and learn why many top brands are choosing Ittiam Audio codecs for their upcoming products. We will be glad to help with further details including product datasheets and evaluations.

Request MPEG-H 3D Audio decoder datasheet: MPEG-H Decoder datasheet (ARM® Cortex®-A)

Please read our blogs on: LC3 codecs for Bluetooth® LE Audio, Multimedia software offerings for ATSC 3.0

Explore our  Audio Solutions