Analytic Incremental Learning For Sound Source Localization With Imbalance Rectification
This addresses real-world deployment issues in sound source localization for applications like robotics or audio processing, though it appears incremental as it builds on existing SSL methods with specific enhancements.
The paper tackles the problem of sound source localization struggling in real-world deployment due to dual imbalance challenges (intra-task and inter-task), which cause catastrophic forgetting and degrade accuracy. The proposed unified framework with GCC-PHAT-based data augmentation and Analytic dynamic imbalance rectifier achieves state-of-the-art results on the SSLR benchmark with 89.0% accuracy, 5.3° mean absolute error, and 1.6 backward transfer.
Sound source localization (SSL) demonstrates remarkable results in controlled settings but struggles in real-world deployment due to dual imbalance challenges: intra-task imbalance arising from long-tailed direction-of-arrival (DoA) distributions, and inter-task imbalance induced by cross-task skews and overlaps. These often lead to catastrophic forgetting, significantly degrading the localization accuracy. To mitigate these issues, we propose a unified framework with two key innovations. Specifically, we design a GCC-PHAT-based data augmentation (GDA) method that leverages peak characteristics to alleviate intra-task distribution skews. We also propose an Analytic dynamic imbalance rectifier (ADIR) with task-adaption regularization, which enables analytic updates that adapt to inter-task dynamics. On the SSLR benchmark, our proposal achieves state-of-the-art (SoTA) results of 89.0% accuracy, 5.3° mean absolute error, and 1.6 backward transfer, demonstrating robustness to evolving imbalances without exemplar storage.