CLAIMay 29, 2025

AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

arXiv:2505.23628v30.0523 citationsh-index: 14
AI Analysis100

This addresses the challenge of scalable knowledge graph construction without manual schema design for AI and data science applications, representing a novel paradigm rather than an incremental improvement.

The authors tackled the problem of knowledge graph construction requiring predefined schemas by developing AutoSchemaKG, a framework that autonomously extracts knowledge triples and induces schemas from text using large language models. The result was ATLAS, a family of knowledge graphs with over 900 million nodes and 5.9 billion edges built from 50 million documents, achieving 92% semantic alignment with human-crafted schemas and improving performance on multi-hop QA tasks and LLM factuality.

We present AutoSchemaKG, a framework for fully autonomous knowledge graph construction that eliminates the need for predefined schemas. Our system leverages large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text, modeling both entities and events while employing conceptualization to organize instances into semantic categories. Processing over 50 million documents, we construct ATLAS (Automated Triple Linking And Schema induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges. This approach outperforms state-of-the-art baselines on multi-hop QA tasks and enhances LLM factuality. Notably, our schema induction achieves 92\% semantic alignment with human-crafted schemas with zero manual intervention, demonstrating that billion-scale knowledge graphs with dynamically induced schemas can effectively complement parametric knowledge in large language models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes