CLApr 23, 2025

Out-of-the-Box Conditional Text Embeddings from Large Language Models

arXiv:2504.16411v14 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge of high labor and resource costs in conditional text embedding for NLP researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the problem of generating conditional text embeddings without requiring extensive training data, proposing PonTE, an unsupervised method that achieves performance comparable to supervised approaches in tasks like semantic text similarity and clustering.

Conditional text embedding is a proposed representation that captures the shift in perspective on texts when conditioned on a specific aspect. Previous methods have relied on extensive training data for fine-tuning models, leading to challenges in terms of labor and resource costs. We propose PonTE, a novel unsupervised conditional text embedding method that leverages a causal large language model and a conditional prompt. Through experiments on conditional semantic text similarity and text clustering, we demonstrate that PonTE can generate useful conditional text embeddings and achieve performance comparable to supervised methods without fine-tuning. We also show the interpretability of text embeddings with PonTE by analyzing word generation following prompts and embedding visualization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes