Safe Distributionally Robust Feature Selection under Covariate Shift
This addresses the issue of unreliable sensor selection in industrial multi-sensor systems under covariate shift, representing an incremental improvement in distributionally robust learning.
The paper tackles the problem of distributionally robust feature selection for sparse sensing applications, where deployment environments differ from development, and proposes safe-DRFS to identify a feature subset that remains optimal across distribution shifts with theoretical guarantees against false feature elimination.
In practical machine learning, the environments encountered during the model development and deployment phases often differ, especially when a model is used by many users in diverse settings. Learning models that maintain reliable performance across plausible deployment environments is known as distributionally robust (DR) learning. In this work, we study the problem of distributionally robust feature selection (DRFS), with a particular focus on sparse sensing applications motivated by industrial needs. In practical multi-sensor systems, a shared subset of sensors is typically selected prior to deployment based on performance evaluations using many available sensors. At deployment, individual users may further adapt or fine-tune models to their specific environments. When deployment environments differ from those anticipated during development, this strategy can result in systems lacking sensors required for optimal performance. To address this issue, we propose safe-DRFS, a novel approach that extends safe screening from conventional sparse modeling settings to a DR setting under covariate shift. Our method identifies a feature subset that encompasses all subsets that may become optimal across a specified range of input distribution shifts, with finite-sample theoretical guarantees of no false feature elimination.