MLLGJun 9, 2016

Variational Information Maximization for Feature Selection

arXiv:1606.02827v153 citations
Originality Highly original
AI Analysis

This work improves feature selection methods for machine learning practitioners by offering a more flexible and general framework, though it is incremental as it builds on existing information-theoretic approaches.

The authors tackled the problem of feature selection by addressing unrealistic assumptions in existing mutual information approximations, proposing a variational framework that yields optimal performance under tree graphical models. Their method strongly outperforms existing information-theoretic approaches in experiments.

Feature selection is one of the most fundamental problems in machine learning. An extensive body of work on information-theoretic feature selection exists which is based on maximizing mutual information between subsets of features and class labels. Practical methods are forced to rely on approximations due to the difficulty of estimating mutual information. We demonstrate that approximations made by existing methods are based on unrealistic assumptions. We formulate a more flexible and general class of assumptions based on variational distributions and use them to tractably generate lower bounds for mutual information. These bounds define a novel information-theoretic framework for feature selection, which we prove to be optimal under tree graphical models with proper choice of variational distributions. Our experiments demonstrate that the proposed method strongly outperforms existing information-theoretic feature selection approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes