CLNov 7, 2020

Rethinking the Value of Transformer Components

arXiv:2011.03803v1999 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of designing optimal Transformer architectures for machine translation, but it is incremental as it builds on existing models without introducing a new paradigm.

The paper tackles the problem of understanding how individual components in Transformer models contribute to performance, finding that certain components are consistently more important across various settings. It proposes a new training strategy that improves translation performance by distinguishing unimportant components.

Transformer becomes the state-of-the-art translation model, while it is not well studied how each intermediate component contributes to the model performance, which poses significant challenges for designing optimal architectures. In this work, we bridge this gap by evaluating the impact of individual component (sub-layer) in trained Transformer models from different perspectives. Experimental results across language pairs, training strategies, and model capacities show that certain components are consistently more important than the others. We also report a number of interesting findings that might help humans better analyze, understand and improve Transformer models. Based on these observations, we further propose a new training strategy that can improves translation performance by distinguishing the unimportant components in training.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes