Optimal Decision Making in High-Throughput Virtual Screening Pipelines
This work addresses a domain-specific problem in computational chemistry and materials science, offering incremental improvements in resource allocation for high-throughput screening.
The paper tackles the challenge of efficiently screening molecular candidates in drug discovery and materials design by proposing an optimal framework for virtual screening pipelines using multi-fidelity models, demonstrating significant acceleration without accuracy degradation and enabling adaptive trade-offs between accuracy and efficiency.
The need for efficient computational screening of molecular candidates that possess desired properties frequently arises in various scientific and engineering problems, including drug discovery and materials design. However, the large size of the search space containing the candidates and the substantial computational cost of high-fidelity property prediction models makes screening practically challenging. In this work, we propose a general framework for constructing and optimizing a virtual screening (HTVS) pipeline that consists of multi-fidelity models. The central idea is to optimally allocate the computational resources to models with varying costs and accuracy to optimize the return-on-computational-investment (ROCI). Based on both simulated as well as real data, we demonstrate that the proposed optimal HTVS framework can significantly accelerate screening virtually without any degradation in terms of accuracy. Furthermore, it enables an adaptive operational strategy for HTVS, where one can trade accuracy for efficiency.