MLLGJun 3, 2020

Double Generative Adversarial Networks for Conditional Independence Testing

arXiv:2006.02615v333 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses a foundational statistical problem in machine learning with potential applications in fields like drug discovery, though it is incremental in leveraging deep learning tools for classical methods.

The authors tackled high-dimensional conditional independence testing by proposing a double generative adversarial networks (GANs) framework to learn conditional distributions and construct a test statistic, achieving asymptotic type-I error control and power approaching one with weaker conditions than existing tests.

In this article, we study the problem of high-dimensional conditional independence testing, a key building block in statistics and machine learning. We propose an inferential procedure based on double generative adversarial networks (GANs). Specifically, we first introduce a double GANs framework to learn two generators of the conditional distributions. We then integrate the two generators to construct a test statistic, which takes the form of the maximum of generalized covariance measures of multiple transformation functions. We also employ data-splitting and cross-fitting to minimize the conditions on the generators to achieve the desired asymptotic properties, and employ multiplier bootstrap to obtain the corresponding $p$-value. We show that the constructed test statistic is doubly robust, and the resulting test both controls type-I error and has the power approaching one asymptotically. Also notably, we establish those theoretical guarantees under much weaker and practically more feasible conditions compared to the existing tests, and our proposal gives a concrete example of how to utilize some state-of-the-art deep learning tools, such as GANs, to help address a classical but challenging statistical problem. We demonstrate the efficacy of our test through both simulations and an application to an anti-cancer drug dataset. A Python implementation of the proposed procedure is available at https://github.com/tianlinxu312/dgcit.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes