Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays
This addresses the communication overhead problem in multi-channel speech enhancement systems, particularly for real-time applications, though it is incremental in optimizing existing trade-offs.
The paper tackles the problem of reducing communication costs in ad-hoc microphone arrays for speech enhancement by jointly learning a microphone selection mechanism and a speech enhancement network, achieving performance matching fixed-streaming models while lowering costs.
In this paper, we present a method for jointly-learning a microphone selection mechanism and a speech enhancement network for multi-channel speech enhancement with an ad-hoc microphone array. The attention-based microphone selection mechanism is trained to reduce communication-costs through a penalty term which represents a task-performance/ communication-cost trade-off. While working within the trade-off, our method can intelligently stream from more microphones in lower SNR scenes and fewer microphones in higher SNR scenes. We evaluate the model in complex echoic acoustic scenes with moving sources and show that it matches the performance of models that stream from a fixed number of microphones while reducing communication costs.