Hayley Weir

h-index11

3papers

22citations

Novelty40%

AI Score43

Ranked #55,620 of 194,257 authors (top 29%)#27 in BM (top 25%)

3 Papers

11.4LGApr 17

Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design

Shriram Chennakesavalu, Kirill Shmilovich, Hayley Weir et al. · mit

Large Language Models (LLMs) have the potential to accelerate small molecule drug design due to their ability to reason about information from diverse sources and formats. However, their practical utility remains unclear due to the lack of benchmarks that reflect real-world scenarios. In this work, we introduce a suite of chemically-grounded tasks spanning molecular property prediction, molecular representation transformations, and molecular design. Importantly, we formulate these tasks as reinforcement learning (RL) environments, enabling a unified approach for evaluation and post-training. Across three model families, we find that frontier models are increasingly proficient at chemical tasks, but that there is significant room for improvement, especially in experimental settings with low data. Critically, we show that RL-based post-training can substantially improve performance. A smaller model post-trained on our environments becomes competitive with state-of-the-art frontier models, despite a significantly weaker base model. This suggests a practical route toward employing LLMs in drug discovery; by combining carefully-designed evaluation tasks with targeted post-training, we can both elucidate and close critical capability gaps.

3.3BMMay 30, 2023Code

Accurate and Efficient Structural Ensemble Generation of Macrocyclic Peptides using Internal Coordinate Diffusion

Colin A. Grambow, Hayley Weir, Nathaniel L. Diamant et al.

Macrocyclic peptides are an emerging therapeutic modality, yet computational approaches for accurately sampling their diverse 3D ensembles remain challenging due to their conformational diversity and geometric constraints. Here, we introduce RINGER, a diffusion-based transformer model using a redundant internal coordinate representation that generates three-dimensional conformational ensembles of macrocyclic peptides from their 2D representations. RINGER provides fast backbone and side-chain sampling while respecting key structural invariances of cyclic peptides. Through extensive benchmarking and analysis against gold-standard conformer ensembles of cyclic peptides generated with metadynamics, we demonstrate how RINGER generates both high-quality and diverse geometries at a fraction of the computational cost. Our work lays the foundation for improved sampling of cyclic geometries and the development of geometric learning methods for peptides.

4.3BMMay 14, 2023Code

CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning

Colin A. Grambow, Hayley Weir, Christian N. Cunningham et al.

Computational and machine learning approaches to model the conformational landscape of macrocyclic peptides have the potential to enable rational design and optimization. However, accurate, fast, and scalable methods for modeling macrocycle geometries remain elusive. Recent deep learning approaches have significantly accelerated protein structure prediction and the generation of small-molecule conformational ensembles, yet similar progress has not been made for macrocyclic peptides due to their unique properties. Here, we introduce CREMP, a resource generated for the rapid development and evaluation of machine learning models for macrocyclic peptides. CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this new dataset contains nearly 31.3 million unique macrocycle geometries, each annotated with energies derived from semi-empirical extended tight-binding (xTB) DFT calculations. Additionally, we include 3,258 macrocycles with reported passive permeability data to couple conformational ensembles to experiment. We anticipate that this dataset will enable the development of machine learning models that can improve peptide design and optimization for novel therapeutics.