AS SDOct 19, 2020

Attention-based scaling adaptation for target speech extraction

Jiangyu Han, Wei Rao, Yanhua Long, Jiaen Liang

arXiv:2010.10923v34.311 citations

Originality Incremental advance

AI Analysis

This work addresses speech extraction in noisy environments, offering an incremental improvement for audio processing applications.

The paper tackles target speech extraction by proposing an attention-based scaling adaptation mechanism that dynamically interacts with mixtures to exploit target speaker clues, achieving effective performance improvements on the spatialized reverberant WSJ0 2-mix dataset and competitive gains in single-channel conditions compared to two-channel methods.

The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and the target speaker to exploit the discriminative target speaker clues. We propose a special attention mechanism without introducing any additional parameters in a scaling adaptation layer to better adapt the network towards extracting the target speech. Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way. Experimental results on the spatialized reverberant WSJ0 2-mix dataset demonstrate that the proposed method can improve the performance of the target speech extraction effectively. Furthermore, we find that under the same network configurations, the ASA in a single-channel condition can achieve competitive performance gains as that achieved from two-channel mixtures with inter-microphone phase difference (IPD) features.

View on arXiv PDF

Similar