Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling

Yoshiki Masuyama, Francois G. Germain, Gordon Wichern, Chiori Hori, Jonathan Le Roux

arXiv:2603.2258965.8h-index: 18

AI Analysis

This work addresses efficient spatial audio modeling for audio processing applications, representing an incremental improvement over prior physics-informed methods.

The paper tackled the problem of modeling Ambisonics impulse responses by reformulating it to automatically satisfy the linearized momentum equation, using a neural network to approximate a velocity potential, and experimental results confirmed its effectiveness.

First-order Ambisonics (FOA) is a standard spatial audio format based on spherical harmonic decomposition. Its zeroth- and first-order components capture the sound pressure and particle velocity, respectively. Recently, physics-informed neural networks have been applied to the spatial interpolation of FOA signals, regularizing the network outputs based on soft penalty terms derived from physical principles, e.g., the linearized momentum equation. In this paper, we reformulate the task so that the predicted FOA signal automatically satisfies the linearized momentum equation. Our network approximates a scalar function called velocity potential, rather than the FOA signal itself. Then, the FOA signal can be readily recovered through the partial derivatives of the velocity potential with respect to the network inputs (i.e., time and microphone position) according to physics of sound propagation. By deriving the four channels of FOA from the single-channel velocity potential, the reconstructed signal follows the physical principle at any time and position by construction. Experimental results on room impulse response reconstruction confirm the effectiveness of the proposed framework.

View on arXiv PDF

Similar