Evaluation of Active Feature Acquisition Methods for Static Feature Settings
This work addresses the need for reliable performance assessment of AFA agents in domains like healthcare, where feature acquisition is costly, by extending evaluation methods to static settings, though it is incremental as it builds on prior work for time-dependent features.
The paper tackles the problem of evaluating active feature acquisition (AFA) agents in static feature settings, where features are time-invariant, by deriving and adapting inverse probability weighting, direct method, and double reinforcement learning estimators within a semi-offline reinforcement learning framework, showing improved data efficiency in synthetic and real-world experiments.
Active feature acquisition (AFA) agents, crucial in domains like healthcare where acquiring features is often costly or harmful, determine the optimal set of features for a subsequent classification task. As deploying an AFA agent introduces a shift in missingness distribution, it's vital to assess its expected performance at deployment using retrospective data. In a companion paper, we introduce a semi-offline reinforcement learning (RL) framework for active feature acquisition performance evaluation (AFAPE) where features are assumed to be time-dependent. Here, we study and extend the AFAPE problem to cover static feature settings, where features are time-invariant, and hence provide more flexibility to the AFA agents in deciding the order of the acquisitions. In this static feature setting, we derive and adapt new inverse probability weighting (IPW), direct method (DM), and double reinforcement learning (DRL) estimators within the semi-offline RL framework. These estimators can be applied when the missingness in the retrospective dataset follows a missing-at-random (MAR) pattern. They also can be applied to missing-not-at-random (MNAR) patterns in conjunction with appropriate existing missing data techniques. We illustrate the improved data efficiency offered by the semi-offline RL estimators in synthetic and real-world data experiments under synthetic MAR and MNAR missingness.