CL LGJul 1, 2019

Natural Language Understanding with the Quora Question Pairs Dataset

Lakshay Sharma, Laura Graesser, Nikita Nangia, Utku Evci

arXiv:1907.01041v16.4105 citations

Originality Synthesis-oriented

AI Analysis

This work addresses duplicate question detection for Quora users, but it is incremental as it applies existing methods to a known dataset.

The paper tackled duplicate question detection on the Quora Question Pairs dataset, finding that a simple Continuous Bag of Words neural network model outperformed more complex recurrent and attention-based models, with error analysis revealing subjectivity in dataset labeling.

This paper explores the task Natural Language Understanding (NLU) by looking at duplicate question detection in the Quora dataset. We conducted extensive exploration of the dataset and used various machine learning models, including linear and tree-based models. Our final finding was that a simple Continuous Bag of Words neural network model had the best performance, outdoing more complicated recurrent and attention based models. We also conducted error analysis and found some subjectivity in the labeling of the dataset.

View on arXiv PDF

Similar