Scalable and interpretable rule-based link prediction for large heterogeneous knowledge graphs
This work provides a significant improvement in the scalability and interpretability of link prediction for researchers and practitioners working with large biomedical knowledge graphs, addressing a key limitation of existing rule-based methods.
This paper addresses the challenge of slow inference times and aggregation difficulties in rule-based link prediction for large heterogeneous knowledge graphs. The authors introduce SAFRAN, a rule application framework that uses a scalable clustering algorithm to aggregate rules, achieving state-of-the-art results for interpretable link prediction on FB15K-237 and OpenBioLink, and increasing inference speeds by up to two orders of magnitude.
Neural embedding-based machine learning models have shown promise for predicting novel links in biomedical knowledge graphs. Unfortunately, their practical utility is diminished by their lack of interpretability. Recently, the fully interpretable, rule-based algorithm AnyBURL yielded highly competitive results on many general-purpose link prediction benchmarks. However, its applicability to large-scale prediction tasks on complex biomedical knowledge bases is limited by long inference times and difficulties with aggregating predictions made by multiple rules. We improve upon AnyBURL by introducing the SAFRAN rule application framework which aggregates rules through a scalable clustering algorithm. SAFRAN yields new state-of-the-art results for fully interpretable link prediction on the established general-purpose benchmark FB15K-237 and the large-scale biomedical benchmark OpenBioLink. Furthermore, it exceeds the results of multiple established embedding-based algorithms on FB15K-237 and narrows the gap between rule-based and embedding-based algorithms on OpenBioLink. We also show that SAFRAN increases inference speeds by up to two orders of magnitude.