Semantic Token Clustering for Efficient Uncertainty Quantification in Large Language Models
This addresses the need for reliable uncertainty quantification in LLMs for users in AI and NLP, though it is incremental as it builds on existing methods by improving efficiency.
The paper tackles the problem of efficiently quantifying uncertainty in large language models to identify unreliable outputs, proposing Semantic Token Clustering (STC) which groups tokens into semantic clusters and quantifies uncertainty based on aggregated probability mass, achieving performance comparable to state-of-the-art baselines while substantially reducing computational overhead.
Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks. However, the truthfulness of their outputs is not guaranteed, and their tendency toward overconfidence further limits reliability. Uncertainty quantification offers a promising way to identify potentially unreliable outputs, but most existing methods rely on repeated sampling or auxiliary models, introducing substantial computational overhead. To address these limitations, we propose Semantic Token Clustering (STC), an efficient uncertainty quantification method that leverages the semantic information inherently encoded in LLMs. Specifically, we group tokens into semantically consistent clusters using embedding clustering and prefix matching, and quantify uncertainty based on the probability mass aggregated over the corresponding semantic cluster. Our approach requires only a single generation and does not depend on auxiliary models. Experimental results show that STC achieves performance comparable to state-of-the-art baselines while substantially reducing computational overhead.