Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise
This addresses the problem of accurately locating multiple speakers in noisy environments for applications like audio processing and robotics, but it appears incremental as it builds on existing CNN methods with a specific training approach.
The paper tackles multi-speaker localization by formulating it as a multi-class multi-label classification problem and using a convolutional neural network trained with synthesized noise signals, achieving results compared to a steered response power method.
The problem of multi-speaker localization is formulated as a multi-class multi-label classification problem, which is solved using a convolutional neural network (CNN) based source localization method. Utilizing the common assumption of disjoint speaker activities, we propose a novel method to train the CNN using synthesized noise signals. The proposed localization method is evaluated for two speakers and compared to a well-known steered response power method.