QComp: A QSAR-Based Data Completion Framework for Drug Discovery
This work addresses data sparsity and integration issues in drug discovery, offering incremental improvements for researchers in pharmaceutical development.
The paper tackled the challenge of integrating evolving experimental data in drug discovery by developing QComp, a QSAR-based data completion framework, which enhanced prediction accuracy across tasks and guided optimal experiment sequencing by quantifying uncertainty reduction.
In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the evolving experimental data as studies progress. We develop QSAR-Complete (QComp), a data completion framework to address this issue. Based on pre-existing QSAR models, QComp utilizes the correlation inherent in experimental data to enhance prediction accuracy across various tasks. Moreover, QComp emerges as a promising tool for guiding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific endpoints, thereby aiding in rational decision-making throughout the drug discovery process.