LG AIFeb 2

Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding

Zihao Jing, Qiuhao Zeng, Ruiyi Fang, Yan Sun, Boyu Wang, Pingzhao Hu

arXiv:2602.02742v12.72 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of efficient and generalizable multimodal molecular understanding for scientific discovery, though it is incremental as it builds on existing graph-LLM bridge methods.

The paper tackles the problem of aligning molecular graphs with Large Language Models (LLMs) for better molecular understanding, introducing EDT-Former which achieves state-of-the-art results on benchmarks like MoleculeQA and MoleculeNet without tuning the LLM backbone.

Molecular understanding is central to advancing areas such as scientific discovery, yet Large Language Models (LLMs) struggle to understand molecular graphs effectively. Existing graph-LLM bridges often adapt the Q-Former-style connector with fixed-length static tokens, which is originally designed for vision tasks. These designs overlook stereochemistry and substructural context and typically require costly LLM-backbone fine-tuning, limiting efficiency and generalization. We introduce EDT-Former, an Entropy-guided Dynamic Token Transformer that generates tokens aligned with informative molecular patches, thereby preserving both local and global structural features for molecular graph understanding. Beyond prior approaches, EDT-Former enables alignment between frozen graph encoders and LLMs without tuning the LLM backbone (excluding the embedding layer), resulting in computationally efficient finetuning, and achieves stateof-the-art results on MoleculeQA, Molecule-oriented Mol-Instructions, and property prediction benchmarks (TDC, MoleculeNet), underscoring its effectiveness for scalable and generalizable multimodal molecular understanding

View on arXiv PDF

Similar