MLITLGOct 27, 2021

Data-Driven Representations for Testing Independence: Modeling, Analysis and Connection with Mutual Information Estimation

arXiv:2110.14122v1
Originality Incremental advance
AI Analysis

This work addresses the problem of independence testing for statisticians and machine learning practitioners, offering a novel method that is incremental in its connection to mutual information estimation.

The paper tackles the problem of testing independence between two continuous random variables by designing a data-driven partition and using an empirical log-likelihood statistic to approximate an oracle test. It shows that this approach connects with mutual information estimation, achieves a strongly consistent distribution-free test under certain conditions, and provides finite-length results and experimental evidence of advantages over non-data-driven strategies.

This work addresses testing the independence of two continuous and finite-dimensional random variables from the design of a data-driven partition. The empirical log-likelihood statistic is adopted to approximate the sufficient statistics of an oracle test against independence (that knows the two hypotheses). It is shown that approximating the sufficient statistics of the oracle test offers a learning criterion for designing a data-driven partition that connects with the problem of mutual information estimation. Applying these ideas in the context of a data-dependent tree-structured partition (TSP), we derive conditions on the TSP's parameters to achieve a strongly consistent distribution-free test of independence over the family of probabilities equipped with a density. Complementing this result, we present finite-length results that show our TSP scheme's capacity to detect the scenario of independence structurally with the data-driven partition as well as new sampling complexity bounds for this detection. Finally, some experimental analyses provide evidence regarding our scheme's advantage for testing independence compared with some strategies that do not use data-driven representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes