LGAIMar 7

ConfHit: Conformal Generative Design with Oracle Free Guarantees

arXiv:2603.07371v11 citations
Predicted impact top 31% in LG · last 90 daysOriginality Highly original
AI Analysis

This work addresses the critical need for reliable guarantees in deep generative models for scientific discovery, particularly in drug discovery, by providing a distribution-free framework for certifying and refining candidate sets.

This paper introduces ConfHit, a framework for generative design that provides guarantees on the presence of desired properties in generated candidates, even under budget constraints, lack of oracle access, and distribution shift. It allows for certifying whether a batch contains at least one hit with a specified confidence and refining the generation to a compact set without weakening this guarantee. ConfHit consistently delivers valid coverage guarantees across various molecule design tasks and methods.

The success of deep generative models in scientific discovery requires not only the ability to generate novel candidates but also reliable guarantees that these candidates indeed satisfy desired properties. Recent conformal-prediction methods offer a path to such guarantees, but its application to generative modeling in drug discovery is limited by budget constraints, lack of oracle access, and distribution shift. To this end, we introduce ConfHit, a distribution-free framework that provides validity guarantees under these conditions. ConfHit formalizes two central questions: (i) Certification: whether a generated batch can be guaranteed to contain at least one hit with a user-specified confidence level, and (ii) Design: whether the generation can be refined to a compact set without weakening this guarantee. ConfHit leverages weighted exchangeability between historical and generated samples to eliminate the need for an experimental oracle, constructs multiple-sample density-ratio weighted conformal p-value to quantify statistical confidence in hits, and proposes a nested testing procedure to certify and refine candidate sets of multiple generated samples while maintaining statistical guarantees. Across representative generative molecule design tasks and a broad range of methods, ConfHit consistently delivers valid coverage guarantees at multiple confidence levels while maintaining compact certified sets, establishing a principled and reliable framework for generative modeling.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes