CLAIApr 3

Evolutionary Search for Automated Design of Uncertainty Quantification Methods

arXiv:2604.0347394.7h-index: 7
AI Analysis

For practitioners needing reliable uncertainty estimates from LLMs, this automates the design of UQ methods, reducing reliance on manual heuristics.

This work applies LLM-powered evolutionary search to automatically design unsupervised UQ methods for large language models, achieving up to 6.7% relative ROC-AUC improvement over manually-designed baselines on atomic claim verification across 9 datasets with robust out-of-distribution generalization.

Uncertainty quantification (UQ) methods for large language models are predominantly designed by hand based on domain knowledge and heuristics, limiting their scalability and generality. We apply LLM-powered evolutionary search to automatically discover unsupervised UQ methods represented as Python programs. On the task of atomic claim verification, our evolved methods outperform strong manually-designed baselines, achieving up to 6.7% relative ROC-AUC improvement across 9 datasets while generalizing robustly out-of-distribution. Qualitative analysis reveals that different LLMs employ qualitatively distinct evolutionary strategies: Claude models consistently design high-feature-count linear estimators, while Gpt-oss-120B gravitates toward simpler and more interpretable positional weighting schemes. Surprisingly, only Sonnet 4.5 and Opus 4.5 reliably leverage increased method complexity to improve performance -- Opus 4.6 shows an unexpected regression relative to its predecessor. Overall, our results indicate that LLM-powered evolutionary search is a promising paradigm for automated, interpretable hallucination detector design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes