CLFeb 17, 2025

Following the Autoregressive Nature of LLM Embeddings via Compression and Alignment

Jingcheng Deng, Zhongtao Jiang, Liang Pang, Liwei Chen, Kun Xu, Zihao Wei, Huawei Shen, Xueqi Cheng

arXiv:2502.11401v317.618 citationsh-index: 19Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses a key inefficiency in using LLMs as text encoders for researchers and practitioners in NLP, though it is an incremental improvement over existing contrastive learning approaches.

The paper tackles the conflict between LLMs' autoregressive nature and contrastive learning for text embeddings by proposing AutoRegEmbed, which integrates information compression and conditional distribution alignment, resulting in significant performance gains over traditional methods and achieving state-of-the-art comparable results with the same data.

A new trend uses LLMs as dense text encoders via contrastive learning. However, since LLM embeddings predict the probability distribution of the next token, they are inherently generative and distributive, conflicting with contrastive learning, which requires embeddings to capture full-text semantics and align via cosine similarity. This discrepancy hinders the full utilization of LLMs' pre-training capabilities, resulting in inefficient learning. In response to this issue, we propose AutoRegEmbed, a new contrastive learning method built on embedding conditional probability distributions, which integrates two core tasks: information compression and conditional distribution alignment. The information compression task encodes text into the embedding space, ensuring that the embedding vectors capture global semantics. The conditional distribution alignment task focuses on aligning text embeddings with positive samples embeddings by leveraging the conditional distribution of embeddings while simultaneously reducing the likelihood of generating negative samples from text embeddings, thereby achieving embedding alignment and uniformity. Experimental results demonstrate that our method significantly outperforms traditional contrastive learning approaches and achieves performance comparable to state-of-the-art models when using the same amount of data.

View on arXiv PDF Code

Similar