Deeper Text Understanding for IR with Contextual Neural Language Modeling
This work addresses the need for deeper text understanding in IR systems, particularly for queries in natural language, though it is incremental as it applies an existing method (BERT) to a new domain.
The paper tackled the problem of limited text understanding in neural information retrieval (IR) by leveraging BERT for contextual representations, resulting in large improvements on natural language queries compared to bag-of-words models and enhanced pre-trained models for data-limited search tasks.
Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations have been done on understanding the text content of a query or a document. This paper studies leveraging a recently-proposed contextual neural language model, BERT, to provide deeper text understanding for IR. Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embeddings. Compared to bag-of-words retrieval models, the contextual language model can better leverage language structures, bringing large improvements on queries written in natural languages. Combining the text understanding ability with search knowledge leads to an enhanced pre-trained BERT model that can benefit related search tasks where training data are limited.