AS LG SD SPNov 5, 2024

Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features

Hanyu Meng, Jeroen Breebaart, Jeremy Stoddard, Vidhyasaharan Sethu, Eliathamby Ambikairajah

arXiv:2411.03172v23.33 citationsh-index: 38ICASSP

Originality Incremental advance

AI Analysis

This work addresses the need for enhanced immersive perception in spatial audio creation, representing an incremental improvement with novel features and network design.

The paper tackled the problem of estimating frequency-varying acoustic parameters like reverberation time, direct-to-reverberant ratio, and clarity from Ambisonics recordings, achieving a reduction in estimation errors by more than half compared to existing methods.

Estimating frequency-varying acoustic parameters is essential for enhancing immersive perception in realistic spatial audio creation. In this paper, we propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR), and clarity (C50) across 10 frequency bands using first-order Ambisonics (FOA) speech recordings as inputs. The proposed framework utilizes a novel feature named Spectro-Spatial Covariance Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal. Our models significantly outperform existing single-channel methods with only spectral information, reducing estimation errors by more than half for all three acoustic parameters. Additionally, we introduce FOA-Conv3D, a novel back-end network for effectively utilising the SSCV feature with a 3D convolutional encoder. FOA-Conv3D outperforms the convolutional neural network (CNN) and recurrent convolutional neural network (CRNN) backends, achieving lower estimation errors and accounting for a higher proportion of variance (PoV) for all 3 acoustic parameters.

View on arXiv PDF

Similar