ExPe: Exact Positional Encodings for Generative Transformer Models with Extrapolating Capabilities
This addresses a key limitation in generative transformer models for natural language processing, though it appears incremental as it builds on existing positional embedding methods.
The paper tackles the problem of transformer models struggling with extrapolation to sequences longer than those seen during training by introducing Exact Positional Embeddings (ExPE), which significantly reduce perplexity in causal language modeling on extended sequences.
This paper introduces a novel approach to position embeddings in transformer models, named "Exact Positional Embeddings" (ExPE). An absolute positional embedding method that can extrapolate to sequences of lengths longer than the ones it was trained on. Traditional transformer models rely on absolute or relative position embeddings to incorporate positional information into token embeddings, which often struggle with extrapolation to sequences longer than those seen during training. Our proposed method utilizes a novel embedding strategy that encodes exact positional information by overriding specific dimensions of the embedding vectors, thereby enabling a more precise representation of token positions. The proposed approach not only maintains the integrity of the original embeddings but also enhances the model's ability to generalize to more extended sequences. In causal language modeling, our ExPE embeddings significantly reduce perplexity compared to rotary and sinusoidal embeddings, when tested on sequences longer than those used in training.