CASHomon Sets: Efficient Rashomon Sets Across Multiple Model Classes and their Hyperparameters
This addresses the need for interpretable model selection in applied machine learning by enabling exploration of diverse models beyond a single class, though it is incremental in extending Rashomon sets to the CASH setting.
The paper tackles the problem of efficiently constructing Rashomon sets across multiple model classes and hyperparameters (CASHomon sets) to reveal alternative well-performing models for interpretation, proposing the TruVaRImp algorithm which reliably identifies these sets and outperforms existing baselines on synthetic and real-world datasets.
Rashomon sets are model sets within one model class that perform nearly as well as a reference model from the same model class. They reveal the existence of alternative well-performing models, which may support different interpretations. This enables selecting models that match domain knowledge, hidden constraints, or user preferences. However, efficient construction methods currently exist for only a few model classes. Applied machine learning usually searches many model classes, and the best class is unknown beforehand. We therefore study Rashomon sets in the combined algorithm selection and hyperparameter optimization (CASH) setting and call them CASHomon sets. We propose TruVaRImp, a model-based active learning algorithm for level set estimation with an implicit threshold, and provide convergence guarantees. On synthetic and real-world datasets, TruVaRImp reliably identifies CASHomon sets members and matches or outperforms naive sampling, Bayesian optimization, classical and implicit level set estimation methods, and other baselines. Our analyses of predictive multiplicity and feature-importance variability across model classes question the common practice of interpreting data through a single model class.