MACReD: A Multi-Agent Collaborative Reasoning Framework for Reaction Diagram Parsing
For researchers in chemistry and document analysis, this work provides a robust method to automatically extract reaction information from complex diagrams, addressing a key bottleneck in chemical literature mining.
MACReD introduces a multi-agent framework for parsing chemical reaction diagrams, achieving state-of-the-art F1 scores of 75.2% (hard) and 84.6% (soft) on the RxnScribe benchmark, outperforming the previous best by 6.1% and 4.6% respectively.
Parsing chemical reaction diagrams from scientific literature is challenging due to heterogeneous layouts, intertwined visual elements, and the difficulty of integrating recognition and reasoning. Existing vision-language models advance multimodal understanding but still fail on complex diagrams, struggling to maintain spatial coherence and to integrate multidimensional information during reasoning. To address these issues, we propose MACReD, a hierarchical multi-agent framework that coordinates specialized agents for molecular perception, arrow understanding, text extraction, and reaction reconstruction within a unified VLM-guided architecture. The planning and perception layers use flexible, fine-grained detection to handle visual complexity, while the reasoning layer uses a multigraph fusion mechanism to integrate heterogeneous cues and enforce chemically consistent global reasoning. Experiments on the RxnScribe benchmark show that MACReD achieves state-of-the-art performance, with F1 scores of 75.2% and 84.6% under hard and soft match criteria, outperforming the RxnScribe baseline, which obtains 69.1% and 80.0%, respectively. These results demonstrate the robustness of MACReD across diverse diagram layouts, including multi-step and tree-structured reactions.