Deploying Deep Ranking Models for Search Verticals
This work addresses the challenge of productionizing neural networks in real-world search systems, offering a modular solution for other systems to implement recent gains, though it is incremental as it focuses on deployment rather than novel model development.
The paper tackles deploying deep ranking models for search verticals by presenting an architecture that executes neural networks for semantic similarity in a production system serving over 500 million users, achieving competitive modeling capability without significant latency impact.
In this paper, we present an architecture executing a complex machine learning model such as a neural network capturing semantic similarity between a query and a document; and deploy to a real-world production system serving 500M+users. We present the challenges that arise in a real-world system and how we solve them. We demonstrate that our architecture provides competitive modeling capability without any significant performance impact to the system in terms of latency. Our modular solution and insights can be used by other real-world search systems to realize and productionize recent gains in neural networks.