Repetition Improves Language Model Embeddings
This provides a simpler and more efficient method for creating text embeddings, potentially unifying NLP architectures, though it is incremental as it builds on existing autoregressive models.
The paper tackles the problem of adapting autoregressive language models into strong text embedding models without requiring bidirectional architectures, achieving over 5% improvement in zero-shot settings and matching or outperforming bidirectionally-converted models in supervised fine-tuning.
Bidirectional models are considered essential for strong text embeddings. Recent approaches to adapt autoregressive language models (LMs) into strong text embedding models have largely had the requirement to modify the LM architecture to be bidirectional. We challenge this premise by introducing "echo embeddings" which converts autoregressive LMs into high quality text embedding models without changing the architecture or requiring fine-tuning. By repeating the input and extracting embeddings from the repeated tokens -- which have access to all original tokens -- echo embeddings improve over classical LM embeddings by over 5% in zero-shot settings. Our zero-shot embeddings nearly match those obtained by bidirectionally-converted LMs that undergo additional masked-language modeling training. Echo embeddings are also compatible with supervised fine-tuning, matching or outperforming bidirectionally-converted LMs in an apples-to-apples comparison, even with an identical compute budget during training and inference. Overall, repetition is a simple and effective strategy to circumvent the need for bidirectional attention in embedding models, paving the way towards a unified architecture for all NLP tasks.