CLFeb 25, 2019

Star-Transformer

Qipeng Guo, Xipeng Qiu, Pengfei Liu, Yunfan Shao, Xiangyang Xue, Zheng Zhang

arXiv:1902.09113v332.51207 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of data inefficiency in NLP for researchers and practitioners, offering a more accessible model for smaller datasets, though it is incremental as it builds on the Transformer architecture.

The paper tackles the Transformer's dependency on large training data by proposing Star-Transformer, a lightweight alternative with a star-shaped topology that reduces complexity from quadratic to linear, achieving significant improvements on modestly sized datasets across four tasks (22 datasets).

Although Transformer has achieved great successes on many NLP tasks, its heavy structure with fully-connected attention connections leads to dependencies on large training data. In this paper, we present Star-Transformer, a lightweight alternative by careful sparsification. To reduce model complexity, we replace the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node. Thus, complexity is reduced from quadratic to linear, while preserving capacity to capture both local composition and long-range dependency. The experiments on four tasks (22 datasets) show that Star-Transformer achieved significant improvements against the standard Transformer for the modestly sized datasets.

View on arXiv PDF Code

Similar