AIFeb 25

Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

arXiv:2602.22413v1h-index: 1

Originality Incremental advance

AI Analysis

This addresses the issue of collective hallucinations in AI safety, particularly for LLM decision-making, though it is an incremental extension of classical voting theory.

The paper tackles the problem of collective accuracy in heterogeneous agents by allowing them to learn their own reliability and abstain from voting, deriving a non-asymptotic lower bound on group success probability that generalizes the Condorcet Jury Theorem to a confidence-gated setting.

We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical epistemic voting results, such as the \textit{Condorcet Jury Theorem} (CJT), assume fixed participation, real-world aggregation often benefits from allowing agents to say ``I don't know.'' We propose a probabilistic framework where agents engage in a \textit{calibration} phase, updating beliefs about their own fixed competence, before facing a final confidence gate that determines whether to vote or abstain. We derive a non-asymptotic lower bound on the group's success probability and prove that this \textit{selective participation} generalizes the asymptotic guarantees of the CJT to a sequential, confidence-gated setting. Empirically, we validate these bounds via Monte Carlo simulations. While our results are general, we discuss their potential application to AI safety, outlining how this framework can mitigate \textit{hallucinations} in collective LLM decision-making.

View on arXiv PDF

Similar