CL AI LG MAJul 31, 2025

LENS: Learning Ensemble Confidence from Neural States for Multi-LLM Answer Integration

arXiv:2507.23167v14.91 citationsh-index: 4

Originality Highly original

AI Analysis

This addresses the need for robust and high-performance LLM systems by improving ensemble techniques, though it is incremental as it builds on existing ensemble learning with a novel confidence estimation approach.

The paper tackled the problem of effectively combining predictions from multiple LLMs by proposing LENS, a method that learns to estimate model confidence from internal neural states, resulting in substantial performance improvements over traditional ensemble methods on multiple-choice and boolean question-answering tasks.

Large Language Models (LLMs) have demonstrated impressive performance across various tasks, with different models excelling in distinct domains and specific abilities. Effectively combining the predictions of multiple LLMs is crucial for enhancing system robustness and performance. However, existing ensemble methods often rely on simple techniques like voting or logits ensembling, which overlook the varying confidence and reliability of models in different contexts. In this work, we propose LENS (Learning ENsemble confidence from Neural States), a novel approach that learns to estimate model confidence by analyzing internal representations. For each LLM, we train a lightweight linear confidence predictor that leverages layer-wise hidden states and normalized probabilities as inputs. This allows for more nuanced weighting of model predictions based on their context-dependent reliability. Our method does not require modifying the model parameters and requires negligible additional computation. Experimental results on multiple-choice and boolean question-answering tasks demonstrate that LENS outperforms traditional ensemble methods by a substantial margin. Our findings suggest that internal representations provide valuable signals for determining model confidence and can be effectively leveraged for ensemble learning.

View on arXiv PDF

Similar