LGITMLJan 25, 2021

Measuring Dependence with Matrix-based Entropy Functional

arXiv:2101.10160v137 citations
Originality Incremental advance
AI Analysis

This work provides a generalizable framework for dependence measurement with broad utility in machine learning problems, though it builds on existing ideas and is incremental in nature.

The authors tackled the problem of measuring dependence among multiple variables without estimating underlying distributions by proposing two matrix-based normalized measures, showing they are differentiable and statistically more powerful than existing methods, with applications in gene network inference and CNN dynamics.

Measuring the dependence of data plays a central role in statistics and machine learning. In this work, we summarize and generalize the main idea of existing information-theoretic dependence measures into a higher-level perspective by the Shearer's inequality. Based on our generalization, we then propose two measures, namely the matrix-based normalized total correlation ($T_α^*$) and the matrix-based normalized dual total correlation ($D_α^*$), to quantify the dependence of multiple variables in arbitrary dimensional space, without explicit estimation of the underlying data distributions. We show that our measures are differentiable and statistically more powerful than prevalent ones. We also show the impact of our measures in four different machine learning problems, namely the gene regulatory network inference, the robust machine learning under covariate shift and non-Gaussian noises, the subspace outlier detection, and the understanding of the learning dynamics of convolutional neural networks (CNNs), to demonstrate their utilities, advantages, as well as implications to those problems. Code of our dependence measure is available at: https://bit.ly/AAAI-dependence

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes