CLAISDASOct 29, 2022

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

CMUMeta AI
arXiv:2210.16643v23 citationsh-index: 58
AI Analysis

This addresses efficiency and performance trade-offs in long speech processing for applications like recognition and summarization, representing a strong specific gain rather than incremental.

The paper tackles the quadratic computational complexity of self-attentions in Transformers for long sequences by developing a novel linear transformer, achieving a 1% absolute WER improvement on Librispeech-100 and a 5-point ROUGE gain on How2 summarization.

Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others. Self-attentions, which are crucial contributors to this performance have quadratic computational complexity, which makes training on longer input sequences challenging. Prior work has produced state-of-the-art transformer variants with linear attention, however, current models sacrifice performance to achieve efficient implementations. In this work, we develop a novel linear transformer by examining the properties of the key-query product within self-attentions. Our model outperforms state of the art approaches on speech recognition and speech summarization, resulting in 1 % absolute WER improvement on the Librispeech-100 speech recognition benchmark and a new INTERVIEW speech recognition benchmark, and 5 points on ROUGE for summarization with How2.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes