Zhan Shen

3.0IRAug 10, 2020

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Kuan Fang, Long Zhao, Zhan Shen et al.

Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this paper, we explore a vector space search framework for document retrieval. Specifically, we trained a deep semantic matching model so that each query and document can be encoded as a low dimensional embedding. Our model was trained based on BERT architecture. We deployed a fast k-nearest-neighbor index service for online serving. Both offline and online metrics demonstrate that our method improved retrieval performance and search quality considerably, particularly for tail

3.0IRJun 7, 2020

SERank: Optimize Sequencewise Learning to Rank Using Squeeze-and-Excitation Network

RuiXing Wang, Kuan Fang, RiKang Zhou et al.

Learning-to-rank (LTR) is a set of supervised machine learning algorithms that aim at generating optimal ranking order over a list of items. A lot of ranking models have been studied during the past decades. And most of them treat each query document pair independently during training and inference. Recently, there are a few methods have been proposed which focused on mining information across ranking candidates list for further improvements, such as learning multivariant scoring function or learning contextual embedding. However, these methods usually greatly increase computational cost during online inference, especially when with large candidates size in real-world web search systems. What's more, there are few studies that focus on novel design of model structure for leveraging information across ranking candidates. In this work, we propose an effective and efficient method named as SERank which is a Sequencewise Ranking model by using Squeeze-and-Excitation network to take advantage of cross-document information. Moreover, we examine our proposed methods on several public benchmark datasets, as well as click logs collected from a commercial Question Answering search engine, Zhihu. In addition, we also conduct online A/B testing at Zhihu search engine to further verify the proposed approach. Results on both offline datasets and online A/B testing demonstrate that our method contributes to a significant improvement.

Zhan Shen

2 Papers