IRDec 15, 2017

Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval

arXiv:1712.05574v14 citations
Originality Highly original
AI Analysis

This addresses the problem of semantic similarity retrieval without training data for applications like Quora and Stack Overflow, representing a novel method rather than an incremental improvement.

The paper tackles unsupervised semantic similarity-based retrieval by constructing semantic flow graphs with soft seeding in graph-based semi-supervised learning, achieving significant improvements over state-of-the-art unsupervised models and comparable results to supervised models on the Stack Exchange QA dataset.

Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant to applications like Quora, Stack Overflow, Siri etc. We propose a novel unsupervised model for semantic similarity based content retrieval, where we construct semantic flow graphs for each query, and introduce the concept of "soft seeding" in graph based semi-supervised learning (SSL) to convert this into an unsupervised model. We demonstrate the effectiveness of our model on an equivalent question retrieval problem on the Stack Exchange QA dataset, where our unsupervised approach significantly outperforms the state-of-the-art unsupervised models, and produces comparable results to the best supervised models. Our research provides a method to tackle semantic similarity based retrieval without any training data, and allows seamless extension to different domain QA communities, as well as to other semantic equivalence tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes