CLMay 17, 2019

ERNIE: Enhanced Language Representation with Informative Entities

Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu

arXiv:1905.07129v31561 citationsHas Code

AI Analysis

This addresses the need for enhanced language representation in NLP by leveraging external structured knowledge, offering a novel integration approach rather than an incremental change.

The authors tackled the problem of existing pre-trained language models not incorporating knowledge graphs (KGs) for better language understanding, and developed ERNIE, which integrates KGs with textual corpora to achieve significant improvements on knowledge-driven tasks while matching BERT's performance on other NLP tasks.

Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently improve the performance of various NLP tasks. However, the existing pre-trained language models rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better language understanding. We argue that informative entities in KGs can enhance language representation with external knowledge. In this paper, we utilize both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE), which can take full advantage of lexical, syntactic, and knowledge information simultaneously. The experimental results have demonstrated that ERNIE achieves significant improvements on various knowledge-driven tasks, and meanwhile is comparable with the state-of-the-art model BERT on other common NLP tasks. The source code of this paper can be obtained from https://github.com/thunlp/ERNIE.

View on arXiv PDF Code

Similar