LGMar 12

Generalist Large Language Models for Molecular Property Prediction: Distilling Knowledge from Specialist Models

Khiem Le, Sreejata Dey, Marcos Martínez Galindo, Vanessa Lopez, Ting Hua, Nitesh V. Chawla, Hoang Thanh Lam

arXiv:2603.1234477.8

Predicted impact top 17% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the challenge of making LLMs practically useful for drug discovery by enhancing their performance in molecular property prediction, representing an incremental advancement toward generalist models in this domain.

The paper tackles the problem of improving Large Language Models (LLMs) for Molecular Property Prediction (MPP) by proposing TreeKD, a knowledge distillation method that transfers knowledge from tree-based specialist models into LLMs, resulting in substantial performance improvements on 22 ADMET properties from the TDC benchmark, narrowing the gap with state-of-the-art specialist models.

Molecular Property Prediction (MPP) is a central task in drug discovery. While Large Language Models (LLMs) show promise as generalist models for MPP, their current performance remains below the threshold for practical adoption. We propose TreeKD, a novel knowledge distillation method that transfers complementary knowledge from tree-based specialist models into LLMs. Our approach trains specialist decision trees on functional group features, then verbalizes their learned predictive rules as natural language to enable rule-augmented context learning. This enables LLMs to leverage structural insights that are difficult to extract from SMILES strings alone. We further introduce rule-consistency, a test-time scaling technique inspired by bagging that ensembles predictions across diverse rules from a Random Forest. Experiments on 22 ADMET properties from the TDC benchmark demonstrate that TreeKD substantially improves LLM performance, narrowing the gap with SOTA specialist models and advancing toward practical generalist models for molecular property prediction.

View on arXiv PDF

Similar