CLSep 3, 2021

Contextualized Embeddings based Convolutional Neural Networks for Duplicate Question Identification

arXiv:2109.01560v2
AI Analysis

This addresses the problem of efficiently identifying duplicate questions for large-scale Question-Answering forums, with incremental improvements in architecture and setup.

The paper tackles duplicate question identification by proposing a novel architecture combining a Bidirectional Transformer Encoder with Convolutional Neural Networks, achieving state-of-the-art performance on the Quora Question Pairs dataset and showing that the Matched-Aggregation setup outperforms the Siamese setup.

Question Paraphrase Identification (QPI) is a critical task for large-scale Question-Answering forums. The purpose of QPI is to determine whether a given pair of questions are semantically identical or not. Previous approaches for this task have yielded promising results, but have often relied on complex recurrence mechanisms that are expensive and time-consuming in nature. In this paper, we propose a novel architecture combining a Bidirectional Transformer Encoder with Convolutional Neural Networks for the QPI task. We produce the predictions from the proposed architecture using two different inference setups: Siamese and Matched Aggregation. Experimental results demonstrate that our model achieves state-of-the-art performance on the Quora Question Pairs dataset. We empirically prove that the addition of convolution layers to the model architecture improves the results in both inference setups. We also investigate the impact of partial and complete fine-tuning and analyze the trade-off between computational power and accuracy in the process. Based on the obtained results, we conclude that the Matched-Aggregation setup consistently outperforms the Siamese setup. Our work provides insights into what architecture combinations and setups are likely to produce better results for the QPI task.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes