CLNov 13, 2022

mOKB6: A Multilingual Open Knowledge Base Completion Benchmark

arXiv:2211.06959v2224 citationsh-index: 44
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of limited multilingual data for open knowledge base completion, which is incremental as it extends existing methods to new languages.

The authors tackled the lack of multilingual resources for open knowledge base completion by creating mOKB6, the first dataset in six languages, and found that combining languages through shared embeddings and translations improves performance, though models struggle with facts across different scripts.

Automated completion of open knowledge bases (Open KBs), which are constructed from triples of the form (subject phrase, relation phrase, object phrase), obtained via open information extraction (Open IE) system, are useful for discovering novel facts that may not be directly present in the text. However, research in Open KB completion (Open KBC) has so far been limited to resource-rich languages like English. Using the latest advances in multilingual Open IE, we construct the first multilingual Open KBC dataset, called mOKB6, containing facts from Wikipedia in six languages (including English). Improving the previous Open KB construction pipeline by doing multilingual coreference resolution and keeping only entity-linked triples, we create a dense Open KB. We experiment with several models for the task and observe a consistent benefit of combining languages with the help of shared embedding space as well as translations of facts. We also observe that current multilingual models struggle to remember facts seen in languages of different scripts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes