Stable but Miscalibrated: A Kantian View on Overconfidence from Filters to Large Language Models
This work addresses overconfidence in AI systems like large language models, offering a novel diagnostic lens, though it is incremental with small-scale tests.
The paper tackles the problem of overconfidence in reasoning systems by reinterpreting Kant's Critique of Pure Reason as a theory of feedback stability and formalizing it with a composite instability index (H-Risk), which predicts overconfident errors in simulations and shows preliminary correlations with miscalibration and hallucination in large language models.
We reinterpret Kant's Critique of Pure Reason as a theory of feedback stability, viewing reason as a regulator that keeps inference within the bounds of possible experience. We formalize this intuition via a composite instability index (H-Risk) combining spectral margin, conditioning, temporal sensitivity, and innovation amplification. In linear-Gaussian simulations, higher H-Risk predicts overconfident errors even under formal stability, revealing a gap between nominal and epistemic stability. Extending to large language models (LLMs), we observe preliminary correlations between internal fragility and miscalibration or hallucination (confabulation), and find that lightweight critique prompts may modestly improve or worsen calibration in small-scale tests. These results suggest a structural bridge between Kantian self-limitation and feedback control, offering a principled lens to diagnose and potentially mitigate overconfidence in reasoning systems.