Towards Semantic Query Segmentation
This addresses query segmentation for information retrieval systems, offering a fast and generalizable method that eliminates expensive hand-tuned features, though it is incremental as it builds on existing embedding techniques.
The paper tackled query segmentation for search intent understanding by proposing a supervised approach using distributed query embeddings, achieving comparable accuracy to state-of-the-art techniques on a 50,000 query web corpus and demonstrating generalization to a 50,000 query eCommerce corpus without fine-tuning.
Query Segmentation is one of the critical components for understanding users' search intent in Information Retrieval tasks. It involves grouping tokens in the search query into meaningful phrases which help downstream tasks like search relevance and query understanding. In this paper, we propose a novel approach to segment user queries using distributed query embeddings. Our key contribution is a supervised approach to the segmentation task using low-dimensional feature vectors for queries, getting rid of traditional hand tuned and heuristic NLP features which are quite expensive. We benchmark on a 50,000 human-annotated web search engine query corpus achieving comparable accuracy to state-of-the-art techniques. The advantage of our technique is its fast and does not use external knowledge-base like Wikipedia for score boosting. This helps us generalize our approach to other domains like eCommerce without any fine-tuning. We demonstrate the effectiveness of this method on another 50,000 human-annotated eCommerce query corpus from eBay search logs. Our approach is easy to implement and generalizes well across different search domains proving the power of low-dimensional embeddings in query segmentation task, opening up a new direction of research for this problem.