MEAIQMMLMay 17

Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels

arXiv:2605.1755921.9
Predicted impact top 51% in ME · last 90 daysOriginality Highly original
AI Analysis

This work provides a principled, unified approach to FDR control in structured hypothesis testing, benefiting researchers in genomics, neuroscience, and other fields where hypotheses have spatial or relational structure.

The paper introduces a framework for controlling False Discovery Rate (FDR) in structured hypothesis spaces using reproducing kernels, unifying continuous domains, graphs, and hierarchies under a single algorithm. The method is validated on spatial data and gene expression tasks, demonstrating FDR control and improved discovery power.

Large-scale hypothesis testing is central to modern science, where controlling the False Discovery Rate (FDR) has become the standard approach to managing false positives across many simultaneous tests. Hypotheses rarely exist in isolation; they often exhibit structure through proximity, connectivity, or hierarchy. This structure represents both a challenge and an opportunity: while classical methods treat these dependencies as obstacles requiring conservative correction, leveraging them can substantially increase discovery power. Here, we reframe structured FDR control as a regularized learning problem. By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR. We validate our method on two sources: spatial locations derived from high-dimensional real-world datasets, and a differential gene expression task utilizing protein-protein interaction graphs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes