Position Interpolation Improves ALiBi Extrapolation
This work addresses the challenge of handling longer sequences in pre-trained models for natural language processing, but it is incremental as it adapts an existing technique from RoPE to ALiBi.
The paper tackled the problem of extending the extrapolation range of models using Attention with Linear Biases (ALiBi) by applying linear position interpolation, resulting in significant improvements in language modeling, summarization, and retrieval tasks.
Linear position interpolation helps pre-trained models using rotary position embeddings (RoPE) to extrapolate to longer sequence lengths. We propose using linear position interpolation to extend the extrapolation range of models using Attention with Linear Biases (ALiBi). We find position interpolation significantly improves extrapolation capability on upstream language modelling and downstream summarization and retrieval tasks.