Online Conformal Selection with Accept-to-Reject Changes
This work addresses a limitation in online conformal selection for applications like drug discovery where decisions must be irreversible, offering a method with theoretical guarantees and improved performance.
The paper tackles the problem of online candidate selection with irreversible decisions, such as in drug discovery, by proposing OCS-ARC, a method that extends conformal selection to allow reconsideration of non-selected candidates while ensuring irreversible selection. It theoretically guarantees false discovery rate control at or below the nominal level and shows significant improvement in selection power over baselines in experiments.
Selecting a subset of promising candidates from a large pool is crucial across various scientific and real-world applications. Conformal selection offers a distribution-free and model-agnostic framework for candidate selection with uncertainty quantification. While effective in offline settings, its application to online scenarios, where data arrives sequentially, poses challenges. Notably, conformal selection permits the deselection of previously selected candidates, which is incompatible with applications requiring irreversible selection decisions. This limitation is particularly evident in resource-intensive sequential processes, such as drug discovery, where advancing a compound to subsequent stages renders reversal impractical. To address this issue, we extend conformal selection to an online Accept-to-Reject Changes (ARC) procedure: non-selected data points can be reconsidered for selection later, and once a candidate is selected, the decision is irreversible. Specifically, we propose a novel conformal selection method, Online Conformal Selection with Accept-to-Reject Changes (dubbed OCS-ARC), which incorporates online Benjamini-Hochberg procedure into the candidate selection process. We provide theoretical guarantees that OCS-ARC controls the false discovery rate (FDR) at or below the nominal level at any timestep under both i.i.d. and exchangeable data assumptions. Additionally, we theoretically show that our approach naturally extends to multivariate response settings. Extensive experiments on synthetic and real-world datasets demonstrate that OCS-ARC significantly improves selection power over the baseline while maintaining valid FDR control across all examined timesteps.