Equitability Analysis of the Maximal Information Coefficient, with Comparisons
This work addresses the problem of identifying strong associations in high-dimensional data for data exploration, though it is incremental as it builds on existing MIC theory.
The paper analyzes the equitability of the maximal information coefficient (MIC) and compares it to alternatives like mutual information estimation and distance correlation, demonstrating that MIC is more equitable across various noise models and sample sizes.
A measure of dependence is said to be equitable if it gives similar scores to equally noisy relationships of different types. Equitability is important in data exploration when the goal is to identify a relatively small set of strongest associations within a dataset as opposed to finding as many non-zero associations as possible, which often are too many to sift through. Thus an equitable statistic, such as the maximal information coefficient (MIC), can be useful for analyzing high-dimensional data sets. Here, we explore both equitability and the properties of MIC, and discuss several aspects of the theory and practice of MIC. We begin by presenting an intuition behind the equitability of MIC through the exploration of the maximization and normalization steps in its definition. We then examine the speed and optimality of the approximation algorithm used to compute MIC, and suggest some directions for improving both. Finally, we demonstrate in a range of noise models and sample sizes that MIC is more equitable than natural alternatives, such as mutual information estimation and distance correlation.