AIApr 21

Learning When Not to Decide: A Framework for Overcoming Factual Presumptuousness in AI Adjudication

Mohamed Afane, Emily Robitschek, Derek Ouyang, Daniel E. Ho

arXiv:2604.1989580.2h-index: 2

AI Analysis

This addresses a critical bottleneck in AI systems for legal applications, affecting millions of unemployment insurance applicants annually, by enabling reliable support for human judgment rather than supplanting it.

The paper tackled the problem of AI presumptuousness in legal adjudication, specifically in unemployment insurance, by introducing a structured prompting framework (SPEC) that achieved 89% overall accuracy and effectively deferred decisions when evidence was insufficient, compared to standard RAG-based approaches with only 15% accuracy in such cases.

A well-known limitation of AI systems is presumptuousness: the tendency of AI systems to provide confident answers when information may be lacking. This challenge is particularly acute in legal applications, where a core task for attorneys, judges, and administrators is to determine whether evidence is sufficient to reach a conclusion. We study this problem in the important setting of unemployment insurance adjudication, which has seen rapid integration of AI systems and where the question of additional fact-finding poses the most significant bottleneck for a system that affects millions of applicants annually. First, through a collaboration with the Colorado Department of Labor and Employment, we secure rare access to official training materials and guidance to design a novel benchmark that systematically varies in information completeness. Second, we evaluate four leading AI platforms and show that standard RAG-based approaches achieve an average of only 15% accuracy when information is insufficient. Third, advanced prompting methods improve accuracy on inconclusive cases but over-correct, withholding decisions even on clear cases. Fourth, we introduce a structured framework requiring explicit identification of missing information before any determination (SPEC, Structured Prompting for Evidence Checklists). SPEC achieves 89% overall accuracy, while appropriately deferring when evidence is insufficient -- demonstrating that presumptuousness in legal AI is systematic but addressable, and that doing so is a necessary step towards systems that reliably support, rather than supplant, human judgment wherever decisions must await sufficient evidence.

View on arXiv PDF

Similar