Token Homogenization under Positional Bias
This addresses a potential issue in transformer-based models for NLP researchers, but it appears incremental as it confirms and explores known phenomena.
The paper investigates token homogenization, where token representations converge toward uniformity across transformer layers, and its relationship to positional bias in large language models, demonstrating through empirical analysis that tokens lose distinctiveness during processing, especially with positional bias.
This paper investigates token homogenization - the convergence of token representations toward uniformity across transformer layers and its relationship to positional bias in large language models. We empirically examine whether homogenization occurs and how positional bias amplifies this effect. Through layer-wise similarity analysis and controlled experiments, we demonstrate that tokens systematically lose distinctiveness during processing, particularly when biased toward extremal positions. Our findings confirm both the existence of homogenization and its dependence on positional attention mechanisms.