LGAIFeb 2

Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation

arXiv:2602.01956v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of efficient uncertainty quantification for LLM developers and users in safety-critical applications, offering a practical incremental improvement over existing methods.

The paper tackles the computational challenge of estimating epistemic uncertainty in large language models by proposing a knowledge distillation framework that uses small draft models to approximate token-level uncertainty, reducing estimation error by up to 37% on GSM8K and achieving competitive hallucination detection with low inference costs.

Quantifying uncertainty in Large Language Models (LLMs) is essential for mitigating hallucinations and enabling risk-aware deployment in safety-critical tasks. However, estimating Epistemic Uncertainty(EU) via Deep Ensembles is computationally prohibitive at the scale of modern models. We propose a framework that leverages the small draft models to efficiently estimate token-level EU, bypassing the need for full-scale ensembling. Theoretically grounded in a Bias-Variance Decomposition, our approach approximates EU via Jensen-Shannon divergence among drafts (variance proxy) and KL divergence between the draft mixture and the target (bias proxy). To further ensure accuracy without significant overhead, we introduce Online Stochastic Distillation (OSD) to efficiently approximate target aggregation and the Data-Diverse Drafts (DDD) strategy to enhance draft diversity for better target approximation. Extensive experiments on GSM8K demonstrate that our method reduces the estimation error (RMSE) by up to 37% compared to baselines. Crucially, our approach achieves Hallucination Detection performance competitive with heavy perturbation-based methods like TokUR while incurring negligible inference costs, offering a practical solution for uncertainty-aware LLM deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes