An Empirical Evaluation of various Deep Learning Architectures for Bi-Sequence Classification Tasks
This work addresses the need for effective architectures in bi-sequence classification for NLP applications, but it is incremental as it focuses on empirical evaluation rather than introducing new methods.
The paper tackled the lack of understanding of optimal deep learning architectures for bi-sequence classification tasks by empirically evaluating 19 architectures across NLP problems like debating and question-answering, establishing first deep learning baselines for three argumentation mining tasks.
Several tasks in argumentation mining and debating, question-answering, and natural language inference involve classifying a sequence in the context of another sequence (referred as bi-sequence classification). For several single sequence classification tasks, the current state-of-the-art approaches are based on recurrent and convolutional neural networks. On the other hand, for bi-sequence classification problems, there is not much understanding as to the best deep learning architecture. In this paper, we attempt to get an understanding of this category of problems by extensive empirical evaluation of 19 different deep learning architectures (specifically on different ways of handling context) for various problems originating in natural language processing like debating, textual entailment and question-answering. Following the empirical evaluation, we offer our insights and conclusions regarding the architectures we have considered. We also establish the first deep learning baselines for three argumentation mining tasks.