Estimating Conditional Mutual Information for Dynamic Feature Selection
This addresses the problem of reducing feature acquisition costs and improving transparency in predictions for machine learning applications, though it is incremental with enhancements to existing methods.
The paper tackles dynamic feature selection by prioritizing features based on mutual information with the response variable, introducing a discriminative estimation approach and improvements like variable budgets and non-uniform costs, with experiments showing consistent gains over recent methods.
Dynamic feature selection, where we sequentially query features to make accurate predictions with a minimal budget, is a promising paradigm to reduce feature acquisition costs and provide transparency into a model's predictions. The problem is challenging, however, as it requires both predicting with arbitrary feature sets and learning a policy to identify valuable selections. Here, we take an information-theoretic perspective and prioritize features based on their mutual information with the response variable. The main challenge is implementing this policy, and we design a new approach that estimates the mutual information in a discriminative rather than generative fashion. Building on our approach, we then introduce several further improvements: allowing variable feature budgets across samples, enabling non-uniform feature costs, incorporating prior information, and exploring modern architectures to handle partial inputs. Our experiments show that our method provides consistent gains over recent methods across a variety of datasets.