LGFeb 15

Position Encoding with Random Float Sampling Enhances Length Generalization of Transformers

arXiv:2602.14050v1
Originality Incremental advance
AI Analysis

This addresses a key limitation in transformers for NLP applications, offering a simple improvement to widely used position encodings, though it is incremental in nature.

The paper tackles the problem of length generalization in language models by introducing Random Float Sampling (RFS), a position encoding strategy that uses continuous random values to avoid out-of-distribution issues, resulting in superior performance on length generalization tasks and zero-shot commonsense reasoning benchmarks.

Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling (RFS), that generalizes well to lengths unseen during pretraining or fine-tuning. In particular, instead of selecting position indices from a predefined discrete set, RFS uses randomly sampled continuous values, thereby avoiding out-of-distribution (OOD) issues on unseen lengths by exposing the model to diverse indices during training. Since assigning indices to tokens is a common and fundamental procedure in widely used PEs, the advantage of RFS can easily be incorporated into, for instance, the absolute sinusoidal encoding, RoPE, and ALiBi. Experiments corroborate its effectiveness by showing that RFS results in superior performance in length generalization tasks as well as zero-shot commonsense reasoning benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes