LGOct 24, 2025

Data as a Lever: A Neighbouring Datasets Perspective on Predictive Multiplicity

arXiv:2510.21303v1h-index: 4
Originality Highly original
AI Analysis

This work addresses the critical role of data in predictive multiplicity, offering insights for improving fairness and reliability in machine learning applications, though it builds incrementally on prior multiplicity research.

The paper tackles the problem of predictive multiplicity by examining how single-data-point differences affect it through a neighbouring datasets framework, finding that greater inter-class distribution overlap reduces multiplicity, contrary to conventional expectations. It extends this framework to active learning and data imputation, proposing novel multiplicity-aware methods for these domains.

Multiplicity -- the existence of distinct models with comparable performance -- has received growing attention in recent years. While prior work has largely emphasized modelling choices, the critical role of data in shaping multiplicity has been comparatively overlooked. In this work, we introduce a neighbouring datasets framework to examine the most granular case: the impact of a single-data-point difference on multiplicity. Our analysis yields a seemingly counterintuitive finding: neighbouring datasets with greater inter-class distribution overlap exhibit lower multiplicity. This reversal of conventional expectations arises from a shared Rashomon parameter, and we substantiate it with rigorous proofs. Building on this foundation, we extend our framework to two practical domains: active learning and data imputation. For each, we establish natural extensions of the neighbouring datasets perspective, conduct the first systematic study of multiplicity in existing algorithms, and finally, propose novel multiplicity-aware methods, namely, multiplicity-aware data acquisition strategies for active learning and multiplicity-aware data imputation techniques.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes