Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers
This work addresses the need for dynamic spatial audio rendering in consumer AR/VR devices, offering a flexible, array-geometry-agnostic solution for personalized binaural audio capture.
The paper introduces a mixture-of-experts framework for field-of-view enhanced binauralization of moving talkers, enabling real-time adaptation to talker motion and direction-dependent audio emphasis without explicit localization. The method achieves signal-dependent binaural matching with implicit localization, supporting applications like speech focus and noise reduction in AR/VR.
We propose a novel mixture of experts framework for field-of-view enhancement in binaural signal matching. Our approach enables dynamic spatial audio rendering that adapts to continuous talker motion, allowing users to emphasize or suppress sounds from selected directions while preserving natural binaural cues. Unlike traditional methods that rely on explicit direction-of-arrival estimation or operate in the Ambisonics domain, our signal-dependent framework combines multiple binaural filters in an online manner using implicit localization. This allows for real-time tracking and enhancement of moving sound sources, supporting applications such as speech focus, noise reduction, and world-locked audio in augmented and virtual reality. The method is agnostic to array geometry offering a flexible solution for spatial audio capture and personalized playback in next-generation consumer audio devices.