AIMay 19, 2025

AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database

Rong Bian, Yu Geng, Zijian Yang, Bing Cheng

arXiv:2505.13406v15 citationsh-index: 1CI

Originality Incremental advance

AI Analysis

This work addresses the problem of automating math KG construction for researchers and educators, though it is incremental as it builds on existing KG and LLM methods.

The paper tackles the challenge of constructing a mathematical knowledge graph (KG) from natural language by proposing AutoMathKG, which automatically integrates diverse sources like ProofWiki and arXiv, using LLMs for augmentation and a vector database for search, resulting in superior reachability queries and robust reasoning capabilities.

A mathematical knowledge graph (KG) presents knowledge within the field of mathematics in a structured manner. Constructing a math KG using natural language is an essential but challenging task. There are two major limitations of existing works: first, they are constrained by corpus completeness, often discarding or manually supplementing incomplete knowledge; second, they typically fail to fully automate the integration of diverse knowledge sources. This paper proposes AutoMathKG, a high-quality, wide-coverage, and multi-dimensional math KG capable of automatic updates. AutoMathKG regards mathematics as a vast directed graph composed of Definition, Theorem, and Problem entities, with their reference relationships as edges. It integrates knowledge from ProofWiki, textbooks, arXiv papers, and TheoremQA, enhancing entities and relationships with large language models (LLMs) via in-context learning for data augmentation. To search for similar entities, MathVD, a vector database, is built through two designed embedding strategies using SBERT. To automatically update, two mechanisms are proposed. For knowledge completion mechanism, Math LLM is developed to interact with AutoMathKG, providing missing proofs or solutions. For knowledge fusion mechanism, MathVD is used to retrieve similar entities, and LLM is used to determine whether to merge with a candidate or add as a new entity. A wide range of experiments demonstrate the advanced performance and broad applicability of the AutoMathKG system, including superior reachability query results in MathVD compared to five baselines and robust mathematical reasoning capability in Math LLM.

View on arXiv PDF

Similar