MLLGMEJul 31, 2020

Deep Direct Likelihood Knockoffs

arXiv:2007.15835v123 citations
AI Analysis

This addresses the need for reliable feature discovery in scientific applications where follow-up experiments are costly, offering an incremental improvement over existing knockoff methods.

The paper tackles the problem of feature selection with controlled false discovery rates in scientific domains, developing Deep Direct Likelihood Knockoffs (DDLK) to generate valid knockoffs by directly minimizing KL divergence from the swap property, achieving higher power than baselines while controlling FDR on synthetic and real benchmarks including a COVID-19 dataset.

Predictive modeling often uses black box machine learning methods, such as deep neural networks, to achieve state-of-the-art performance. In scientific domains, the scientist often wishes to discover which features are actually important for making the predictions. These discoveries may lead to costly follow-up experiments and as such it is important that the error rate on discoveries is not too high. Model-X knockoffs enable important features to be discovered with control of the FDR. However, knockoffs require rich generative models capable of accurately modeling the knockoff features while ensuring they obey the so-called "swap" property. We develop Deep Direct Likelihood Knockoffs (DDLK), which directly minimizes the KL divergence implied by the knockoff swap property. DDLK consists of two stages: it first maximizes the explicit likelihood of the features, then minimizes the KL divergence between the joint distribution of features and knockoffs and any swap between them. To ensure that the generated knockoffs are valid under any possible swap, DDLK uses the Gumbel-Softmax trick to optimize the knockoff generator under the worst-case swap. We find DDLK has higher power than baselines while controlling the false discovery rate on a variety of synthetic and real benchmarks including a task involving a large dataset from one of the epicenters of COVID-19.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes