CL AINov 15, 2015

Word Embedding based Correlation Model for Question/Answer Matching

Yikang Shen, Wenge Rong, Nan Jiang, Baolin Peng, Jie Tang, Zhang Xiong

arXiv:1511.04646v211.249 citations

Originality Incremental advance

AI Analysis

This work addresses improving user experience in community Q&A services by enhancing matching accuracy, though it appears incremental as it builds on existing translation and embedding methods.

The paper tackles the lexical gap in question-answer matching by proposing a Word Embedding based Correlation (WEC) model, which integrates translation models and word embeddings to score co-occurrence probabilities and handle rare word pairs, showing promising results on Yahoo! Answers and Baidu Zhidao datasets.

With the development of community based question answering (Q&A) services, a large scale of Q&A archives have been accumulated and are an important information and knowledge resource on the web. Question and answer matching has been attached much importance to for its ability to reuse knowledge stored in these systems: it can be useful in enhancing user experience with recurrent questions. In this paper, we try to improve the matching accuracy by overcoming the lexical gap between question and answer pairs. A Word Embedding based Correlation (WEC) model is proposed by integrating advantages of both the translation model and word embedding, given a random pair of words, WEC can score their co-occurrence probability in Q&A pairs and it can also leverage the continuity and smoothness of continuous space word representation to deal with new pairs of words that are rare in the training parallel text. An experimental study on Yahoo! Answers dataset and Baidu Zhidao dataset shows this new method's promising potential.

View on arXiv PDF

Similar