An Unsupervised Model with Attention Autoencoders for Question Retrieval
This addresses the problem of reducing reliance on labeled data and manual features for question retrieval, offering an incremental improvement over supervised approaches.
The paper tackles question retrieval in community question answering by proposing an unsupervised framework called RAMN, which integrates deep semantic representations, lexical mismatching, and initial search engine ranks, achieving comparable performance to state-of-the-art supervised methods on SemEval-2016 and outperforming the best system on SemEval-2017.
Question retrieval is a crucial subtask for community question answering. Previous research focus on supervised models which depend heavily on training data and manual feature engineering. In this paper, we propose a novel unsupervised framework, namely reduced attentive matching network (RAMN), to compute semantic matching between two questions. Our RAMN integrates together the deep semantic representations, the shallow lexical mismatching information and the initial rank produced by an external search engine. For the first time, we propose attention autoencoders to generate semantic representations of questions. In addition, we employ lexical mismatching to capture surface matching between two questions, which is derived from the importance of each word in a question. We conduct experiments on the open CQA datasets of SemEval-2016 and SemEval-2017. The experimental results show that our unsupervised model obtains comparable performance with the state-of-the-art supervised methods in SemEval-2016 Task 3, and outperforms the best system in SemEval-2017 Task 3 by a wide margin.