MLMENov 15, 2013

On Estimating Many Means, Selection Bias, and the Bootstrap

arXiv:1311.3709v110 citations
Originality Incremental advance
AI Analysis

This addresses bias issues for researchers in fields like genomics using high-throughput data, though it is incremental as it builds on existing frequentist and empirical Bayes ideas.

The paper tackles the problem of selection bias in estimating extreme effect-sizes from large-scale hypothesis testing, showing that an oracle estimator reduces bias compared to naive methods and proposing a resampling-based approximation that performs well in simulations.

With recent advances in high throughput technology, researchers often find themselves running a large number of hypothesis tests (thousands+) and esti- mating a large number of effect-sizes. Generally there is particular interest in those effects estimated to be most extreme. Unfortunately naive estimates of these effect-sizes (even after potentially accounting for multiplicity in a testing procedure) can be severely biased. In this manuscript we explore this bias from a frequentist perspective: we give a formal definition, and show that an oracle estimator using this bias dominates the naive maximum likelihood estimate. We give a resampling estimator to approximate this oracle, and show that it works well on simulated data. We also connect this to ideas in empirical Bayes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes