LGOct 24, 2025

Distributionally Robust Feature Selection

arXiv:2510.21113v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the challenge of feature selection for robust model performance across diverse populations in resource-constrained settings.

The paper tackles the problem of selecting a limited set of features that enable models to perform well across multiple subpopulations, particularly in costly data collection scenarios like surveys or sensors, and demonstrates validation through experiments on synthetic and real-world datasets.

We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is costly, e.g. requiring adding survey questions or physical sensors, and we must be able to use the selected features to create high-quality downstream models for different populations. Our method frames the problem as a continuous relaxation of traditional variable selection using a noising mechanism, without requiring backpropagation through model training processes. By optimizing over the variance of a Bayes-optimal predictor, we develop a model-agnostic framework that balances overall performance of downstream prediction across populations. We validate our approach through experiments on both synthetic datasets and real-world data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes