LGAIJun 8, 2021

Dynamic Instance-Wise Classification in Correlated Feature Spaces

arXiv:2106.04668v17 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of feature selection in correlated feature spaces for machine learning practitioners, offering a scalable solution that is incremental over existing methods.

The paper tackles the problem of improving classification accuracy and interpretability by dynamically selecting the most informative feature subset for each test instance, using a Bayesian network to model feature dependencies, and demonstrates effectiveness on diverse real-world datasets.

In a typical supervised machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training. However, using a different subset of features that is most informative for each test instance individually may not only improve prediction accuracy, but also the overall interpretability of the model. At the same time, feature selection methods for classification have been known to be the most effective when many features are irrelevant and/or uncorrelated. In fact, feature selection ignoring correlations between features can lead to poor classification performance. In this work, a Bayesian network is utilized to model feature dependencies. Using the dependency network, a new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy. The optimum number of features to acquire and the optimum classification strategy are derived for each test instance. The theoretical properties of the optimum solution are analyzed, and a new algorithm is proposed that takes advantage of these properties to implement a robust and scalable solution for high dimensional settings. The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes