William McCorkindale

11.7CHEM-PHJun 28, 2020Code

Data-Driven Discovery of Molecular Photoswitches with Multioutput Gaussian Processes

Ryan-Rhys Griffiths, Jake L. Greenfield, Aditya R. Thawani et al.

Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset

1.2QMOct 24, 2020Code

Investigating 3D Atomic Environments for Enhanced QSAR

William McCorkindale, Carl Poelking, Alpha A. Lee

Predicting bioactivity and physical properties of molecules is a longstanding challenge in drug design. Most approaches use molecular descriptors based on a 2D representation of molecules as a graph of atoms and bonds, abstracting away the molecular shape. A difficulty in accounting for 3D shape is in designing molecular descriptors can precisely capture molecular shape while remaining invariant to rotations/translations. We describe a novel alignment-free 3D QSAR method using Smooth Overlap of Atomic Positions (SOAP), a well-established formalism developed for interpolating potential energy surfaces. We show that this approach rigorously describes local 3D atomic environments to compare molecular shapes in a principled manner. This method performs competitively with traditional fingerprint-based approaches as well as state-of-the-art graph neural networks on pIC$_{50}$ ligand-binding prediction in both random and scaffold split scenarios. We illustrate the utility of SOAP descriptors by showing that its inclusion in ensembling diverse representations statistically improves performance, demonstrating that incorporating 3D atomic environments could lead to enhanced QSAR for cheminformatics.

William McCorkindale

2 Papers