ME CO MLJun 30, 2021

AdaPT-GMM: Powerful and robust covariate-assisted multiple testing

arXiv:2106.15812v11.2Has Code

Originality Incremental advance

AI Analysis

This addresses the need for more powerful and robust multiple testing procedures in statistics and data science, especially for composite null hypotheses, though it appears incremental as it refines the existing AdaPT method.

The paper tackled the problem of covariate-assisted multiple testing with false discovery rate (FDR) control by proposing AdaPT-GMM, an empirical Bayes method that models local false discovery rates using covariates and p-values, and it consistently delivered high power in simulations and real data, outperforming state-of-the-art methods in scenarios where AdaPT was underpowered.

We propose a new empirical Bayes method for covariate-assisted multiple testing with false discovery rate (FDR) control, where we model the local false discovery rate for each hypothesis as a function of both its covariates and p-value. Our method refines the adaptive p-value thresholding (AdaPT) procedure by generalizing its masking scheme to reduce the bias and variance of its false discovery proportion estimator, improving the power when the rejection set is small or some null p-values concentrate near 1. We also introduce a Gaussian mixture model for the conditional distribution of the test statistics given covariates, modeling the mixing proportions with a generic user-specified classifier, which we implement using a two-layer neural network. Like AdaPT, our method provably controls the FDR in finite samples even if the classifier or the Gaussian mixture model is misspecified. We show in extensive simulations and real data examples that our new method, which we call AdaPT-GMM, consistently delivers high power relative to competing state-of-the-art methods. In particular, it performs well in scenarios where AdaPT is underpowered, and is especially well-suited for testing composite null hypothesis, such as whether the effect size exceeds a practical significance threshold.

View on arXiv PDF Code

Similar