Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding
This addresses data overlap sparsity for researchers in drug discovery, enabling more efficient generation of ligand PK data, though it is incremental as it applies diffusion models to a specific domain problem.
The paper tackles the challenge of data sparsity in drug pharmacokinetic (PK) datasets by proposing Imagand, a SMILES-to-PK diffusion model that generates synthetic PK data, which closely resembles real data distributions and improves downstream task performance.
Artificial intelligence (AI) is increasingly used in every stage of drug development. One challenge facing drug discovery AI is that drug pharmacokinetic (PK) datasets are often collected independently from each other, often with limited overlap, creating data overlap sparsity. Data sparsity makes data curation difficult for researchers looking to answer research questions in poly-pharmacy, drug combination research, and high-throughput screening. We propose Imagand, a novel SMILES-to-Pharmacokinetic (S2PK) diffusion model capable of generating an array of PK target properties conditioned on SMILES inputs. We show that Imagand-generated synthetic PK data closely resembles real data univariate and bivariate distributions, and improves performance for downstream tasks. Imagand is a promising solution for data overlap sparsity and allows researchers to efficiently generate ligand PK data for drug discovery research. Code is available at https://github.com/bing1100/Imagand.