CLJul 7, 2022

Active Learning and Multi-label Classification for Ellipsis and Coreference Detection in Conversational Question-Answering

Quentin Brabant, Lina Maria Rojas-Barahona, Claire Gardent

arXiv:2207.03145v10.31 citationsh-index: 30

Originality Synthesis-oriented

AI Analysis

This work addresses a specific challenge in natural language processing for conversational AI, but it is incremental as it applies existing methods like DistilBERT and active learning to a niche task.

The paper tackled the problem of automatically detecting ellipsis and coreference in conversational question-answering to improve human-machine dialogue fluency, achieving enhanced classifier performance on a manually labeled dataset.

In human conversations, ellipsis and coreference are commonly occurring linguistic phenomena. Although these phenomena are a mean of making human-machine conversations more fluent and natural, only few dialogue corpora contain explicit indications on which turns contain ellipses and/or coreferences. In this paper we address the task of automatically detecting ellipsis and coreferences in conversational question answering. We propose to use a multi-label classifier based on DistilBERT. Multi-label classification and active learning are employed to compensate the limited amount of labeled data. We show that these methods greatly enhance the performance of the classifier for detecting these phenomena on a manually labeled dataset.

View on arXiv PDF

Similar