MEJun 15, 2020
The leave-one-covariate-out conditional randomization testEugene Katsevich, Aaditya Ramdas
Conditional independence testing is an important problem, yet provably hard without assumptions. One of the assumptions that has become popular of late is called "model-X", where we assume we know the joint distribution of the covariates, but assume nothing about the conditional distribution of the outcome given the covariates. Knockoffs is a popular methodology associated with this framework, but it suffers from two main drawbacks: only one-bit $p$-values are available for inference on each variable, and the method is randomized with significant variability across runs in practice. The conditional randomization test (CRT) is thought to be the "right" solution under model-X, but usually viewed as computationally inefficient. This paper proposes a computationally efficient leave-one-covariate-out (LOCO) CRT that addresses both drawbacks of knockoffs. LOCO CRT produces valid $p$-values that can be used to control the familywise error rate, and has nearly zero algorithmic variability. For L1 regularized M-estimators, we develop an even faster variant called L1ME CRT, which reuses computation by leveraging a novel observation about the stability of the cross-validated lasso to removing inactive variables. Last, for multivariate Gaussian covariates, we present a closed form expression for the LOCO CRT $p$-value, thus completely eliminating resampling in this important special case.
STMay 12, 2020
On the power of conditional independence testing under model-XEugene Katsevich, Aaditya Ramdas
For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we study the power of MX CI tests, yielding quantitative insights into the role of machine learning and providing evidence in favor of using likelihood-based statistics in practice. Focusing on the conditional randomization test (CRT), we find that its conditional mode of inference allows us to reformulate it as testing a point null hypothesis involving the conditional distribution of X. The Neyman-Pearson lemma then implies that a likelihood-based statistic yields the most powerful CRT against a point alternative. We also obtain a related optimality result for MX knockoffs. Switching to an asymptotic framework with arbitrarily growing covariate dimension, we derive an expression for the limiting power of the CRT against local semiparametric alternatives in terms of the prediction error of the machine learning algorithm on which its test statistic is based. Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error control under the assumption that only the first two moments of X given Z are known, a significant relaxation of the MX assumption.
CVDec 2, 2014
Covariance estimation using conjugate gradient for 3D classification in Cryo-EMJoakim Andén, Eugene Katsevich, Amit Singer
Classifying structural variability in noisy projections of biological macromolecules is a central problem in Cryo-EM. In this work, we build on a previous method for estimating the covariance matrix of the three-dimensional structure present in the molecules being imaged. Our proposed method allows for incorporation of contrast transfer function and non-uniform distribution of viewing angles, making it more suitable for real-world data. We evaluate its performance on a synthetic dataset and an experimental dataset obtained by imaging a 70S ribosome complex.