SLIM: Sparse Latent Steering for Interpretable and Property-Directed LLM-Based Molecular Editing

Mingxu Zhang, Yuhan Li, Lujundong Li, Dazhong Shen, Hui Xiong, Ying Sun

arXiv:2605.1083171.81 citations

AI Analysis

For researchers using LLMs for molecular editing, SLIM provides a method to improve property control and interpretability without retraining, addressing a key bottleneck in property-directed molecular design.

SLIM introduces a plug-and-play framework using sparse autoencoders with learnable importance gates to decompose LLM hidden states into property-aligned features, enabling precise molecular editing without model modification. On the MolEditRL benchmark, it achieves consistent improvements across four architectures and eight properties, with gains up to 42.4 points.

Large language models possess strong chemical reasoning capabilities, making them effective molecular editors. However, property-relevant information is implicitly entangled across their dense hidden states, providing no explicit handle for property control: a substantial fraction of edits fail to improve or even degrade target properties. To address these issues, we propose SLIM (Sparse Latent Interpretable Molecular editing), a plug-and-play framework that decomposes the editor's hidden states into sparse, property-aligned features via a Sparse Autoencoder with learnable importance gates. Steering in this sparse feature space precisely activates property-relevant dimensions, improving editing success rate without modifying model parameters. The same sparse basis further supports interpretable analysis of editing behavior. Experiments on the MolEditRL benchmark across four model architectures and eight molecular properties show consistent gains over baselines, with improvements of up to 42.4 points.

View on arXiv PDF

Similar