LGAICLBMMay 29, 2025

Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization

arXiv:2505.23987v15 citationsh-index: 5Has CodeEMNLP
Originality Incremental advance
AI Analysis

This addresses a practical limitation in drug design for pharmaceutical applications, though it is incremental as it builds on existing LLM methods.

The paper tackles the problem of molecule optimization requiring selective improvement of multiple molecular properties to pharmaceutical levels, introducing C-MuMOInstruct and GeLLMO-Cs, which achieve up to 126% higher success rate in experiments.

In real-world drug design, molecule optimization requires selectively improving multiple molecular properties up to pharmaceutically relevant levels, while maintaining others that already meet such criteria. However, existing computational approaches and instruction-tuned LLMs fail to capture such nuanced property-specific objectives, limiting their practical applicability. To address this, we introduce C-MuMOInstruct, the first instruction-tuning dataset focused on multi-property optimization with explicit, property-specific objectives. Leveraging C-MuMOInstruct, we develop GeLLMO-Cs, a series of instruction-tuned LLMs that can perform targeted property-specific optimization. Our experiments across 5 in-distribution and 5 out-of-distribution tasks show that GeLLMO-Cs consistently outperform strong baselines, achieving up to 126% higher success rate. Notably, GeLLMO-Cs exhibit impressive 0-shot generalization to novel optimization tasks and unseen instructions. This offers a step toward a foundational LLM to support realistic, diverse optimizations with property-specific objectives. C-MuMOInstruct and code are accessible through https://github.com/ninglab/GeLLMO-C.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes