CL AIFeb 7, 2025

Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics

arXiv:2502.05239v14.92 citationsh-index: 2KGSWC

Originality Synthesis-oriented

AI Analysis

This work provides incremental improvements in evaluation metrics for knowledge graph construction, which is important for researchers and practitioners in natural language processing and knowledge representation.

This paper tackles the problem of evaluating knowledge graph construction from text by addressing hallucination and omission issues, introducing an enhanced framework with BERTScore for graph similarity and a 95% matching threshold. The results show that fine-tuning the Mistral model improves accuracy and reduces errors on specific datasets, but it performs worse in generalization tasks.

Recent advancements in large language models have demonstrated significant potential in the automated construction of knowledge graphs from unstructured text. This paper builds upon our previous work [16], which evaluated various models using metrics like precision, recall, F1 score, triple matching, and graph matching, and introduces a refined approach to address the critical issues of hallucination and omission. We propose an enhanced evaluation framework incorporating BERTScore for graph similarity, setting a practical threshold of 95% for graph matching. Our experiments focus on the Mistral model, comparing its original and fine-tuned versions in zero-shot and few-shot settings. We further extend our experiments using examples from the KELM-sub training dataset, illustrating that the fine-tuned model significantly improves knowledge graph construction accuracy while reducing the exact hallucination and omission. However, our findings also reveal that the fine-tuned models perform worse in generalization tasks on the KELM-sub dataset. This study underscores the importance of comprehensive evaluation metrics in advancing the state-of-the-art in knowledge graph construction from textual data.

View on arXiv PDF

Similar