ASLGSDSPMar 31, 2022

Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs

arXiv:2203.16940v236 citations
Originality Highly original
AI Analysis

This work addresses DOA estimation for audio signal processing, offering a more efficient and robust solution for applications like acoustic localization, though it is incremental in improving upon spherical CNNs.

The paper tackles Direction of Arrival (DOA) estimation of sound sources by proposing an Icosahedral CNN that is equivariant to spherical rotations, achieving root mean square localization errors below 10° in high-reverberation scenarios with lower computational cost than existing methods.

In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10° even in scenarios with a reverberation time $T_{60}$ of 1.5 s.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes