CL IR LGNov 10, 2025

When Sufficient is not Enough: Utilizing the Rashomon Effect for Complete Evidence Extraction

arXiv:2511.07055v11 citationsh-index: 2

Originality Incremental advance

AI Analysis

This addresses the need for complete evidence extraction in domains like healthcare for compliance, though it is incremental as it builds on existing feature attribution methods.

The paper tackles the problem of feature attribution methods providing only minimal sufficient evidence, which is inadequate for applications requiring complete evidence, such as compliance and cataloging, and shows that aggregating evidence from several models improves evidence recall from approximately 0.60 to 0.86 in a medical dataset case study.

Feature attribution methods typically provide minimal sufficient evidence justifying a model decision. However, in many applications this is inadequate. For compliance and cataloging, the full set of contributing features must be identified - complete evidence. We perform a case study on a medical dataset which contains human-annotated complete evidence. We show that individual models typically recover only subsets of complete evidence and that aggregating evidence from several models improves evidence recall from $\sim$0.60 (single best model) to $\sim$0.86 (ensemble). We analyze the recall-precision trade-off, the role of training with evidence, dynamic ensembles with certainty thresholds, and discuss implications.

View on arXiv PDF

Similar