Contextualization Distillation from Large Language Model for Knowledge Graph Completion
This work addresses the problem of enhancing KGC models for researchers and practitioners by providing a plug-and-play method that improves accuracy and explainability, though it is incremental as it builds on existing KGC frameworks.
The paper tackles the limitations of static and noisy textual corpora in knowledge graph completion (KGC) by introducing a Contextualization Distillation strategy that uses large language models to enrich triplets and trains smaller models via auxiliary tasks, achieving consistent performance improvements across diverse datasets and KGC techniques.
While textual information significantly enhances the performance of pre-trained language models (PLMs) in knowledge graph completion (KGC), the static and noisy nature of existing corpora collected from Wikipedia articles or synsets definitions often limits the potential of PLM-based KGC models. To surmount these challenges, we introduce the Contextualization Distillation strategy, a versatile plug-in-and-play approach compatible with both discriminative and generative KGC frameworks. Our method begins by instructing large language models (LLMs) to transform compact, structural triplets into context-rich segments. Subsequently, we introduce two tailored auxiliary tasks, reconstruction and contextualization, allowing smaller KGC models to assimilate insights from these enriched triplets. Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach, revealing consistent performance enhancements irrespective of underlying pipelines or architectures. Moreover, our analysis makes our method more explainable and provides insight into generating path selection, as well as the choosing of suitable distillation tasks. All the code and data in this work will be released at https://github.com/David-Li0406/Contextulization-Distillation