CLApr 24, 2020

Contextualized Representations Using Textual Encyclopedic Knowledge

Mandar Joshi, Kenton Lee, Yi Luan, Kristina Toutanova

arXiv:2004.12006v23.131 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of factual reasoning in question answering for NLP applications, though it is incremental as it builds on existing BERT-style encoders.

The paper tackles the problem of improving reading comprehension by integrating dynamically retrieved textual encyclopedic knowledge with input texts, resulting in F1 score improvements of 1.6 to 4.2 on various QA datasets like TriviaQA and MRQA.

We present a method to represent input texts by contextualizing them jointly with dynamically retrieved textual encyclopedic background knowledge from multiple documents. We apply our method to reading comprehension tasks by encoding questions and passages together with background sentences about the entities they mention. We show that integrating background knowledge from text is effective for tasks focusing on factual reasoning and allows direct reuse of powerful pretrained BERT-style encoders. Moreover, knowledge integration can be further improved with suitable pretraining via a self-supervised masked language model objective over words in background-augmented input text. On TriviaQA, our approach obtains improvements of 1.6 to 3.1 F1 over comparable RoBERTa models which do not integrate background knowledge dynamically. On MRQA, a large collection of diverse QA datasets, we see consistent gains in-domain along with large improvements out-of-domain on BioASQ (2.1 to 4.2 F1), TextbookQA (1.6 to 2.0 F1), and DuoRC (1.1 to 2.0 F1).

View on arXiv PDF

Similar