LGJan 14

Resolving Predictive Multiplicity for the Rashomon Set

Parian Haghighat, Hadis Anahideh, Cynthia Rudin

arXiv:2601.09071v12.71 citationsh-index: 5

Originality Incremental advance

AI Analysis

This addresses the problem of inconsistent predictions from equally accurate models for users in high-stakes domains, but it is incremental as it builds on existing concepts of the Rashomon set.

The paper tackles predictive multiplicity, where multiple equally accurate models (the Rashomon set) produce inconsistent predictions, undermining trust in high-stakes applications. The authors propose three approaches—outlier correction, local patching, and pairwise reconciliation—to reduce inconsistency, and experiments show they lower disagreement metrics while maintaining competitive accuracy.

The existence of multiple, equally accurate models for a given predictive task leads to predictive multiplicity, where a ``Rashomon set'' of models achieve similar accuracy but diverges in their individual predictions. This inconsistency undermines trust in high-stakes applications where we want consistent predictions. We propose three approaches to reduce inconsistency among predictions for the members of the Rashomon set. The first approach is \textbf{outlier correction}. An outlier has a label that none of the good models are capable of predicting correctly. Outliers can cause the Rashomon set to have high variance predictions in a local area, so fixing them can lower variance. Our second approach is local patching. In a local region around a test point, models may disagree with each other because some of them are biased. We can detect and fix such biases using a validation set, which also reduces multiplicity. Our third approach is pairwise reconciliation, where we find pairs of models that disagree on a region around the test point. We modify predictions that disagree, making them less biased. These three approaches can be used together or separately, and they each have distinct advantages. The reconciled predictions can then be distilled into a single interpretable model for real-world deployment. In experiments across multiple datasets, our methods reduce disagreement metrics while maintaining competitive accuracy.

View on arXiv PDF

Similar