AI LGSep 11, 2023

UniKG: A Benchmark and Universal Embedding for Large-Scale Knowledge Graphs

Yide Qiu, Shaoxiang Ling, Tong Zhang, Bo Huang, Zhen Cui

arXiv:2309.05269v12.1h-index: 17Has Code

Originality Incremental advance

AI Analysis

This provides a large-scale benchmark for knowledge mining and heterogeneous graph representation learning, addressing a gap in the field, though it is incremental in extending homogeneous graph methods to heterogeneous graphs.

The authors tackled the lack of large-scale heterogeneous graph datasets and effective learning methods by constructing UniKG, a benchmark with over 77 million entities and 2000 association types, and introduced a plug-and-play anisotropy propagation module that enables efficient information propagation and multi-attribute association mining.

Irregular data in real-world are usually organized as heterogeneous graphs (HGs) consisting of multiple types of nodes and edges. To explore useful knowledge from real-world data, both the large-scale encyclopedic HG datasets and corresponding effective learning methods are crucial, but haven't been well investigated. In this paper, we construct a large-scale HG benchmark dataset named UniKG from Wikidata to facilitate knowledge mining and heterogeneous graph representation learning. Overall, UniKG contains more than 77 million multi-attribute entities and 2000 diverse association types, which significantly surpasses the scale of existing HG datasets. To perform effective learning on the large-scale UniKG, two key measures are taken, including (i) the semantic alignment strategy for multi-attribute entities, which projects the feature description of multi-attribute nodes into a common embedding space to facilitate node aggregation in a large receptive field; (ii) proposing a novel plug-and-play anisotropy propagation module (APM) to learn effective multi-hop anisotropy propagation kernels, which extends methods of large-scale homogeneous graphs to heterogeneous graphs. These two strategies enable efficient information propagation among a tremendous number of multi-attribute entities and meantimes adaptively mine multi-attribute association through the multi-hop aggregation in large-scale HGs. We set up a node classification task on our UniKG dataset, and evaluate multiple baseline methods which are constructed by embedding our APM into large-scale homogenous graph learning methods. Our UniKG dataset and the baseline codes have been released at https://github.com/Yide-Qiu/UniKG.

View on arXiv PDF Code

Similar