IRLGFeb 1, 2019

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

arXiv:1902.00172v1112 citations
Originality Incremental advance
AI Analysis

This work addresses the issue of redundant and ambiguous facts in Open KBs for natural language processing applications, representing an incremental improvement over manual feature engineering approaches.

The paper tackles the problem of canonicalizing Open Knowledge Bases (Open KBs) by proposing CESI, a method that uses learned embeddings and side information to cluster noun and relation phrases, which reduces redundancy and ambiguity in stored facts.

Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes