AIFeb 27

MMKG-RDS: Reasoning Data Synthesis via Deep Mining of Multimodal Knowledge Graphs

Lun Zhan, Feng Xiong, Huanyong Liu, Feng Zhang, Yuhui Yin

arXiv:2602.23632v12.4h-index: 5Has Code

Originality Incremental advance

AI Analysis

It addresses data synthesis limitations for domain-specific reasoning tasks, offering a flexible framework with broad applicability across multiple domains.

The paper tackles the problem of synthesizing high-quality training data to enhance domain models' reasoning abilities by proposing MMKG-RDS, a framework using multimodal knowledge graphs, which improves reasoning accuracy by 9.2% when fine-tuning Qwen3 models on synthesized samples.

Synthesizing high-quality training data is crucial for enhancing domain models' reasoning abilities. Existing methods face limitations in long-tail knowledge coverage, effectiveness verification, and interpretability. Knowledge-graph-based approaches still fall short in functionality, granularity, customizability, and evaluation. To address these issues, we propose MMKG-RDS, a flexible framework for reasoning data synthesis that leverages multimodal knowledge graphs. It supports fine-grained knowledge extraction, customizable path sampling, and multidimensional data quality scoring. We validate MMKG-RDS with the MMKG-RDS-Bench dataset, covering five domains, 17 task types, and 14,950 samples. Experimental results show fine-tuning Qwen3 models (0.6B/8B/32B) on a small number of synthesized samples improves reasoning accuracy by 9.2%. The framework also generates distinct data, challenging existing models on tasks involving tables and formulas, useful for complex benchmark construction. The dataset and code are available at https://github.com/360AILAB-NLP/MMKG-RDS

View on arXiv PDF Code

Similar