AI CLNov 3, 2024

Ontology Population using LLMs

Sanaz Saki Norouzi, Adrita Barua, Antrea Christou, Nikita Gautam, Andrew Eells, Pascal Hitzler, Cogan Shimizu

arXiv:2411.01612v17.310 citationsh-index: 13

Originality Incremental advance

AI Analysis

This addresses the costly and challenging task of knowledge graph population for data integration and representation, though it is incremental as it builds on existing LLM capabilities with prompt engineering.

The study tackled the problem of populating knowledge graphs from unstructured text by using Large Language Models (LLMs), achieving approximately 90% triple extraction accuracy when guided by a modular ontology.

Knowledge graphs (KGs) are increasingly utilized for data integration, representation, and visualization. While KG population is critical, it is often costly, especially when data must be extracted from unstructured text in natural language, which presents challenges, such as ambiguity and complex interpretations. Large Language Models (LLMs) offer promising capabilities for such tasks, excelling in natural language understanding and content generation. However, their tendency to ``hallucinate'' can produce inaccurate outputs. Despite these limitations, LLMs offer rapid and scalable processing of natural language data, and with prompt engineering and fine-tuning, they can approximate human-level performance in extracting and structuring data for KGs. This study investigates LLM effectiveness for the KG population, focusing on the Enslaved.org Hub Ontology. In this paper, we report that compared to the ground truth, LLM's can extract ~90% of triples, when provided a modular ontology as guidance in the prompts.

View on arXiv PDF

Similar