CLMay 28

Counterfactual Graph for Multi-Agent LLM Calibration

arXiv:2605.3065362.5h-index: 3
AI Analysis

This work tackles the critical problem of over-confidence and miscalibration in multi-agent LLM systems for developers and users relying on their outputs.

This paper addresses the issue of false consensus in multi-agent LLM systems where communication can lead to correlated failures, making agreement an unreliable indicator of correctness. The proposed CAGE-CAL framework calibrates confidence by comparing observed post-communication agent graphs with counterfactual no-communication graphs, improving reliability discrimination across five benchmarks and outperforming fixed-topology strategies in topology selection.

Multi-agent LLM systems often treat agreement as evidence: when many agents in a panel give the same answer, that answer is assumed to be more reliable. We show that this assumption can fail after agents communicate. Communication can induce correlated failures and false consensus, so the same vote share may reflect reliable agreement in one topology but over-confidence in another. We propose CAGE-CAL, a counterfactual agent-graph calibration framework for multi-agent LLMs. For each query, CAGE-CAL compares an observed post-communication agent graph with a matched counterfactual no-communication graph, capturing both pairwise failure correlations and group-level dependencies. Rather than simply counting how many agents agree, CAGE-CAL estimates the counterfactual shift between observed and no-communication dependence, and calibrates confidence accordingly. Across five benchmarks, CAGE-CAL improves reliability discrimination with competitive ECE, and its calibrated confidence further improves topology selection over the best fixed-topology strategy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes