Modeling and Driving Human Body Soundfields through Acoustic Primitives
This addresses the lack of spatial audio modeling in virtual reality and animation, enabling more immersive experiences, though it is an incremental advancement by transferring graphical primitives to acoustics.
The paper tackles the problem of generating high-quality spatial audio for 3D human body models, which has been largely ignored, by presenting a framework that renders the full 3D soundfield from body pose and head-mounted audio, achieving an order of magnitude smaller representations and improved near-field rendering.
While rendering and animation of photorealistic 3D human body models have matured and reached an impressive quality over the past years, modeling the spatial audio associated with such full body models has been largely ignored so far. In this work, we present a framework that allows for high-quality spatial audio generation, capable of rendering the full 3D soundfield generated by a human body, including speech, footsteps, hand-body interactions, and others. Given a basic audio-visual representation of the body in form of 3D body pose and audio from a head-mounted microphone, we demonstrate that we can render the full acoustic scene at any point in 3D space efficiently and accurately. To enable near-field and realtime rendering of sound, we borrow the idea of volumetric primitives from graphical neural rendering and transfer them into the acoustic domain. Our acoustic primitives result in an order of magnitude smaller soundfield representations and overcome deficiencies in near-field rendering compared to previous approaches.