CLAIMay 31, 2025

Efficient Latent Semantic Clustering for Scaling Test-Time Computation of LLMs

arXiv:2506.00344v15 citationsh-index: 15EMNLP
Originality Incremental advance
AI Analysis

This addresses the problem of high computational costs in test-time scaling for LLM users, though it is incremental as it builds on existing clustering methods.

The paper tackles the computational overhead of semantic clustering in test-time scaling for LLMs by proposing Latent Semantic Clustering (LSC), which uses internal hidden states instead of external models, resulting in significant efficiency improvements while maintaining or exceeding performance.

Scaling test-time computation--generating and analyzing multiple or sequential outputs for a single input--has become a promising strategy for improving the reliability and quality of large language models (LLMs), as evidenced by advances in uncertainty quantification and multi-step reasoning. A key shared component is semantic clustering, which groups outputs that differ in form but convey the same meaning. Semantic clustering enables estimation of the distribution over the semantics of outputs and helps avoid redundant exploration of reasoning paths. However, existing approaches typically rely on external models, which introduce substantial computational overhead and often fail to capture context-aware semantics. We propose Latent Semantic Clustering (LSC), a lightweight and context-sensitive method that leverages the generator LLM's internal hidden states for clustering, eliminating the need for external models. Our extensive experiment across various LLMs and datasets shows that LSC significantly improves the computational efficiency of test-time scaling while maintaining or exceeding the performance of existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes