ASLGSDSPJun 14, 2023

Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications

arXiv:2306.08510v11 citationsh-index: 68
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in sound source tracking applications by improving handling of permutation invariance, though it is incremental as it builds on existing recurrent methods.

The paper tackles the problem of multi-source tracking in sound source localization by introducing a permutation-invariant recurrent neural network architecture that uses unordered sets for input and state, enabling individual embeddings for each source and order-independent assignment of estimates to trajectories.

Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach results in the information from all the sources being contained in a single ordered vector, which is not optimal for permutation-invariant problems such as multi-source tracking. In this paper, we present a new recurrent architecture that uses unordered sets to represent both its input and its state and that is invariant to the permutations of the input set and equivariant to the permutations of the state set. Hence, the information of every sound source is represented in an individual embedding and the new estimates are assigned to the tracked trajectories regardless of their order.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes