AICLCVMay 17

ChemVA: Advancing Large Language Models on Chemical Reaction Diagrams Understanding

arXiv:2605.1721478.7
Predicted impact top 37% in AI · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the critical need for LLMs to understand chemical reaction diagrams, a bottleneck in scientific AI, with substantial performance improvements.

LLMs struggle to interpret chemical reaction diagrams due to visual and semantic bottlenecks. The proposed ChemVA framework achieves 92.0% structural recognition accuracy and yields a consistent ~20 percentage point performance gain across 9 LLMs, enabling open-weight models to rival proprietary SOTA systems.

While Large Language Models (LLMs) have revolutionized scientific text processing, they exhibit a significant capability gap when interpreting chemical reaction diagrams. We identify two fundamental bottlenecks restricting current systems: a Visual Deficit, where generic vision encoders struggle to resolve the strict topological connectivity of dense molecular graphs, and a Semantic Disconnect, where standard linear strings, such as SMILES, fail to effectively activate the model's latent chemical reasoning. To bridge these gaps, we propose the Chemical Visual Activation (ChemVA) framework, which employs a Visual Anchor mechanism to ground functional groups via hybrid-granularity detection, followed by a semantic alignment approach that translates visual features into entity names to maximize knowledge activation in LLMs. We evaluate our approach on OCRD-Bench, a newly constructed dataset featuring dense visual-semantic contexts and comprehensive reaction coverage to evaluate the full spectrum from recognition to reasoning. Extensive experiments on OCRD-Bench demonstrate that ChemVA achieves 92.0% structural recognition accuracy. By bridging visual and semantic bottlenecks, our framework delivers a consistent performance gain of approximately 20 percentage points across 9 diverse LLMs, enabling open-weight models to rival proprietary SOTA systems in complex chemical reasoning tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes