CL AI LG ME MLOct 24, 2025

Embedding Trust: Semantic Isotropy Predicts Nonfactuality in Long-Form Text Generation

Dhrupad Bhardwaj, Julia Kempe, Tim G. J. Rudner

arXiv:2510.21891v11 citationsh-index: 17

Originality Highly original

AI Analysis

This addresses the need for low-cost trust assessment in high-stakes LLM applications, offering a practical solution for real-world workflows.

The paper tackles the problem of assessing trustworthiness in long-form text generated by LLMs by introducing semantic isotropy, which measures embedding dispersion, and finds that higher isotropy reliably predicts lower factual consistency, outperforming existing methods across multiple domains.

To deploy large language models (LLMs) in high-stakes application domains that require substantively accurate responses to open-ended prompts, we need reliable, computationally inexpensive methods that assess the trustworthiness of long-form responses generated by LLMs. However, existing approaches often rely on claim-by-claim fact-checking, which is computationally expensive and brittle in long-form responses to open-ended prompts. In this work, we introduce semantic isotropy -- the degree of uniformity across normalized text embeddings on the unit sphere -- and use it to assess the trustworthiness of long-form responses generated by LLMs. To do so, we generate several long-form responses, embed them, and estimate the level of semantic isotropy of these responses as the angular dispersion of the embeddings on the unit sphere. We find that higher semantic isotropy -- that is, greater embedding dispersion -- reliably signals lower factual consistency across samples. Our approach requires no labeled data, no fine-tuning, and no hyperparameter selection, and can be used with open- or closed-weight embedding models. Across multiple domains, our method consistently outperforms existing approaches in predicting nonfactuality in long-form responses using only a handful of samples -- offering a practical, low-cost approach for integrating trust assessment into real-world LLM workflows.

View on arXiv PDF

Similar