CL AIAug 14, 2025

Rule2Text: A Framework for Generating and Evaluating Natural Language Explanations of Knowledge Graph Rules

arXiv:2508.10971v12 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This work addresses the interpretability problem for knowledge graph users, though it is incremental as it applies existing LLM techniques to a specific domain.

The authors tackled the problem of interpreting complex logical rules mined from knowledge graphs by developing Rule2Text, a framework that uses large language models to generate natural language explanations. Their results show significant improvements in explanation quality after fine-tuning, with particularly strong gains on domain-specific datasets.

Knowledge graphs (KGs) can be enhanced through rule mining; however, the resulting logical rules are often difficult for humans to interpret due to their inherent complexity and the idiosyncratic labeling conventions of individual KGs. This work presents Rule2Text, a comprehensive framework that leverages large language models (LLMs) to generate natural language explanations for mined logical rules, thereby improving KG accessibility and usability. We conduct extensive experiments using multiple datasets, including Freebase variants (FB-CVT-REV, FB+CVT-REV, and FB15k-237) as well as the ogbl-biokg dataset, with rules mined using AMIE 3.5.1. We systematically evaluate several LLMs across a comprehensive range of prompting strategies, including zero-shot, few-shot, variable type incorporation, and Chain-of-Thought reasoning. To systematically assess models' performance, we conduct a human evaluation of generated explanations on correctness and clarity. To address evaluation scalability, we develop and validate an LLM-as-a-judge framework that demonstrates strong agreement with human evaluators. Leveraging the best-performing model (Gemini 2.0 Flash), LLM judge, and human-in-the-loop feedback, we construct high-quality ground truth datasets, which we use to fine-tune the open-source Zephyr model. Our results demonstrate significant improvements in explanation quality after fine-tuning, with particularly strong gains in the domain-specific dataset. Additionally, we integrate a type inference module to support KGs lacking explicit type information. All code and data are publicly available at https://github.com/idirlab/KGRule2NL.

View on arXiv PDF Code

Similar