LGITMLMay 1, 2021

Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks

arXiv:2105.00191v119 citations
Originality Incremental advance
AI Analysis

This work addresses feature selection issues in discriminative machine learning, particularly for high-dimensional biological data, presenting an incremental improvement over conventional methods.

The paper tackles the problem of sub-optimal feature selection in supervised dimensionality reduction by introducing an end-to-end neural network training approach based on stochastic mutual information gradient estimation, projecting features to maximize mutual information with class labels without distributional assumptions.

Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection algorithms based on any criterion leading to potentially sub-optimal solutions for class separability. In that regard, we introduce emerging information theoretic feature transformation protocols as an end-to-end neural network training approach. We present a dimensionality reduction network (MMINet) training procedure based on the stochastic estimate of the mutual information gradient. The network projects high-dimensional features onto an output feature space where lower dimensional representations of features carry maximum mutual information with their associated class labels. Furthermore, we formulate the training objective to be estimated non-parametrically with no distributional assumptions. We experimentally evaluate our method with applications to high-dimensional biological data sets, and relate it to conventional feature selection algorithms to form a special case of our approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes