LGFeb 18

Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition

arXiv:2602.16684v1h-index: 1
Originality Incremental advance
AI Analysis

This addresses the challenge of generating realistic molecular analogs for medicinal chemists, though it appears incremental as it builds on existing matched molecular pair approaches.

The paper tackles the problem of generating diverse and controllable molecular analogs by training a foundation model on large-scale matched molecular pair transformations, achieving improved diversity, novelty, and controllability in experiments on chemical corpora and patent datasets.

Matched molecular pairs (MMPs) capture the local chemical edits that medicinal chemists routinely use to design analogs, but existing ML approaches either operate at the whole-molecule level with limited edit controllability or learn MMP-style edits from restricted settings and small models. We propose a variable-to-variable formulation of analog generation and train a foundation model on large-scale MMP transformations (MMPTs) to generate diverse variables conditioned on an input variable. To enable practical control, we develop prompting mechanisms that let the users specify preferred transformation patterns during generation. We further introduce MMPT-RAG, a retrieval-augmented framework that uses external reference analogs as contextual guidance to steer generation and generalize from project-specific series. Experiments on general chemical corpora and patent-specific datasets demonstrate improved diversity, novelty, and controllability, and show that our method recovers realistic analog structures in practical discovery scenarios.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes