CLAIFeb 22, 2021

Position Information in Transformers: An Overview

arXiv:2102.11090v2649 citations
Originality Synthesis-oriented
AI Analysis

This is an incremental survey that synthesizes existing research to help practitioners select and compare position encoding methods for NLP applications.

The paper provides an overview and theoretical comparison of methods for incorporating position information into Transformer models, addressing the need to handle sequential language data despite the model's inherent invariance to input order.

Transformers are arguably the main workhorse in recent Natural Language Processing research. By definition a Transformer is invariant with respect to reordering of the input. However, language is inherently sequential and word order is essential to the semantics and syntax of an utterance. In this article, we provide an overview and theoretical comparison of existing methods to incorporate position information into Transformer models. The objectives of this survey are to (1) showcase that position information in Transformer is a vibrant and extensive research area; (2) enable the reader to compare existing methods by providing a unified notation and systematization of different approaches along important model dimensions; (3) indicate what characteristics of an application should be taken into account when selecting a position encoding; (4) provide stimuli for future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes