A Syntax Aware BERT for Identifying Well-Formed Queries in a Curriculum Framework
This work addresses the incremental task of query formulation quality for natural language processing applications, with specific improvements in accuracy.
The paper tackles the problem of identifying well-formed queries by proposing a BERT-based model enhanced with parts-of-speech information and trained using curriculum learning techniques, achieving an accuracy of 83.93% and outperforming the previous state-of-the-art at 75.0%.
A well formed query is defined as a query which is formulated in the manner of an inquiry, and with correct interrogatives, spelling and grammar. While identifying well formed queries is an important task, few works have attempted to address it. In this paper we propose transformer based language model - Bidirectional Encoder Representations from Transformers (BERT) to this task. We further imbibe BERT with parts-of-speech information inspired from earlier works. Furthermore, we also train the model in multiple curriculum settings for improvement in performance. Curriculum Learning over the task is experimented with Baby Steps and One Pass techniques. Proposed architecture performs exceedingly well on the task. The best approach achieves accuracy of 83.93%, outperforming previous state-of-the-art at 75.0% and reaching close to the approximate human upper bound of 88.4%.