CLJul 18, 2023

Linearized Relative Positional Encoding

Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

arXiv:2307.09270v16.318 citationsh-index: 39Has Code

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in linear transformers by providing a principled framework for developing efficient positional encoding methods, which is incremental as it builds upon existing approaches but offers a general paradigm for broader applications.

The authors tackled the problem of designing relative positional encoding methods suitable for linear transformers, which require decomposition into kernel functions, by proposing a family of linearized relative positional encoding (LRPE) algorithms via unitary transformation. The result is that LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification compared to existing methods.

Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe.

View on arXiv PDF Code

Similar