LGAIJul 11, 2023

Rethinking Distribution Shifts: Empirical Analysis and Inductive Modeling for Tabular Data

arXiv:2307.05284v512 citationsh-index: 20Has Code
Originality Incremental advance
AI Analysis

This work addresses the problem of developing robust algorithms for tabular data by providing empirical insights, though it is incremental in highlighting overlooked implementation factors.

The study analyzed distribution shifts in tabular data and found that Y|X-shifts are more common than X-shifts, and robust algorithms performed no better than standard methods, with implementation details like model choice having a larger impact.

Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. However, methodological development for robust algorithms typically relies on structural assumptions that lack empirical validation. Advocating for an empirically grounded data-driven approach to algorithm development, we build an empirical testbed comprising natural shifts across 8 tabular datasets, 172 distribution pairs over 45 methods and 90,000 method configurations encompassing empirical risk minimization and distributionally robust optimization (DRO) methods. We find $Y|X$-shifts are most prevalent in our testbed, in stark contrast to the heavy focus on $X$ (covariate)-shifts in the ML literature, and that the performance of robust algorithms is no better than that of vanilla methods. To understand why, we conduct an in-depth empirical analysis of DRO methods and find that underlooked implementation details -- such as the choice of underlying model class (e.g., LightGBM) and hyperparameter selection -- have a bigger impact on performance than the ambiguity set or its radius. We illustrate via case studies how a data-driven, inductive understanding of distribution shifts can provide a new approach to algorithm development.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes