ASSDOct 13, 2019

T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement

arXiv:1910.06762v3203 citations
Originality Incremental advance
AI Analysis

This addresses speech enhancement, a domain-specific task, with an incremental improvement over existing methods.

The paper tackled the problem of Transformer neural networks underperforming in speech enhancement due to contextual differences from NLP tasks, by proposing T-GSA with Gaussian-weighted self-attention that attenuates weights based on symbol distance, resulting in significantly improved performance compared to Transformers and RNNs.

Transformer neural networks (TNN) demonstrated state-of-art performance on many natural language processing (NLP) tasks, replacing recurrent neural networks (RNNs), such as LSTMs or GRUs. However, TNNs did not perform well in speech enhancement, whose contextual nature is different than NLP tasks, like machine translation. Self-attention is a core building block of the Transformer, which not only enables parallelization of sequence computation, but also provides the constant path length between symbols that is essential to learning long-range dependencies. In this paper, we propose a Transformer with Gaussian-weighted self-attention (T-GSA), whose attention weights are attenuated according to the distance between target and context symbols. The experimental results show that the proposed T-GSA has significantly improved speech-enhancement performance, compared to the Transformer and RNNs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes