ME AP MLMay 16, 2014

Selection Bias Correction and Effect Size Estimation under Dependence

Kean Ming Tan, Noah Simon, Daniela Witten

arXiv:1405.4251v2

Originality Incremental advance

AI Analysis

This addresses a critical issue for researchers in genomics and other fields conducting large-scale studies, offering a more robust method for effect size estimation when data are dependent, though it is an incremental improvement over existing frameworks.

The paper tackles the problem of selection bias in effect size estimation for large-scale hypothesis testing, such as in genomics, where naive estimates are inflated by chance. It proposes a new estimator that corrects for bias without assuming independence, showing improved performance in simulations and on two gene expression datasets.

We consider large-scale studies in which it is of interest to test a very large number of hypotheses, and then to estimate the effect sizes corresponding to the rejected hypotheses. For instance, this setting arises in the analysis of gene expression or DNA sequencing data. However, naive estimates of the effect sizes suffer from selection bias, i.e., some of the largest naive estimates are large due to chance alone. Many authors have proposed methods to reduce the effects of selection bias under the assumption that the naive estimates of the effect sizes are independent. Unfortunately, when the effect size estimates are dependent, these existing techniques can have very poor performance, and in practice there will often be dependence. We propose an estimator that adjusts for selection bias under a recently-proposed frequentist framework, without the independence assumption. We study some properties of the proposed estimator, and illustrate that it outperforms past proposals in a simulation study and on two gene expression data sets.

View on arXiv PDF

Similar