CLLGJul 1, 2019

Natural Language Understanding with the Quora Question Pairs Dataset

arXiv:1907.01041v1105 citations
Originality Synthesis-oriented
AI Analysis

This work addresses duplicate question detection for Quora users, but it is incremental as it applies existing methods to a known dataset.

The paper tackled duplicate question detection on the Quora Question Pairs dataset, finding that a simple Continuous Bag of Words neural network model outperformed more complex recurrent and attention-based models, with error analysis revealing subjectivity in dataset labeling.

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes