ERNIE: Enhanced Representation through Knowledge Integration
This work addresses the problem of enhancing language models with structured knowledge for better performance in Chinese NLP tasks, representing an incremental improvement over existing methods like BERT.
The paper tackles improving language representation by integrating knowledge through entity-level and phrase-level masking strategies, resulting in ERNIE achieving state-of-the-art results on five Chinese NLP tasks and demonstrating enhanced knowledge inference capacity.
We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration). Inspired by the masking strategy of BERT, ERNIE is designed to learn language representation enhanced by knowledge masking strategies, which includes entity-level masking and phrase-level masking. Entity-level strategy masks entities which are usually composed of multiple words.Phrase-level strategy masks the whole phrase which is composed of several words standing together as a conceptual unit.Experimental results show that ERNIE outperforms other baseline methods, achieving new state-of-the-art results on five Chinese natural language processing tasks including natural language inference, semantic similarity, named entity recognition, sentiment analysis and question answering. We also demonstrate that ERNIE has more powerful knowledge inference capacity on a cloze test.