CLFeb 25, 2019

Star-Transformer

arXiv:1902.09113v31207 citations
Originality Incremental advance
AI Analysis

This addresses the problem of data inefficiency in NLP for researchers and practitioners, offering a more accessible model for smaller datasets, though it is incremental as it builds on the Transformer architecture.

The paper tackles the Transformer's dependency on large training data by proposing Star-Transformer, a lightweight alternative with a star-shaped topology that reduces complexity from quadratic to linear, achieving significant improvements on modestly sized datasets across four tasks (22 datasets).

Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes