CLLGFeb 28, 2020

Temporal Convolutional Attention-based Network For Sequence Modeling

arXiv:2002.12530v347 citations
Originality Incremental advance
AI Analysis

This work addresses the need for more effective feed-forward models in sequence modeling, offering incremental improvements over existing methods.

The authors tackled the problem of sequence modeling by proposing a new architecture called Temporal Convolutional Attention-based Network (TCAN) that combines temporal convolutional networks and attention mechanisms, achieving state-of-the-art results such as 30.28 bpc/perplexity on word-level PTB and 1.092 on character-level PTB.

With the development of feed-forward models, the default model for sequence modeling has gradually evolved to replace recurrent networks. Many powerful feed-forward models based on convolutional networks and attention mechanism were proposed and show more potential to handle sequence modeling tasks. We wonder that is there an architecture that can not only achieve an approximate substitution of recurrent network, but also absorb the advantages of feed-forward models. So we propose an exploratory architecture referred to Temporal Convolutional Attention-based Network (TCAN) which combines temporal convolutional network and attention mechanism. TCAN includes two parts, one is Temporal Attention (TA) which captures relevant features inside the sequence, the other is Enhanced Residual (ER) which extracts shallow layer's important information and transfers to deep layers. We improve the state-of-the-art results of bpc/perplexity to 30.28 on word-level PTB, 1.092 on character-level PTB, and 9.20 on WikiText-2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes