IR CL LGNov 19, 2018

End-to-End Retrieval in Continuous Space

Daniel Gillick, Alessandro Presta, Gaurav Singh Tomar

arXiv:1811.08008v129.7108 citations

Originality Incremental advance

AI Analysis

This addresses retrieval efficiency and accuracy for text-based information systems, though it appears incremental as it builds on existing embedding and ANN methods.

The paper tackles the problem of end-to-end continuous retrieval by replacing discrete inverted indexes with approximate nearest neighbor search on learned embeddings, improving MAP by 8% and 26% on two similar-question retrieval tasks.

Most text-based information retrieval (IR) systems index objects by words or phrases. These discrete systems have been augmented by models that use embeddings to measure similarity in continuous space. But continuous-space models are typically used just to re-rank the top candidates. We consider the problem of end-to-end continuous retrieval, where standard approximate nearest neighbor (ANN) search replaces the usual discrete inverted index, and rely entirely on distances between learned embeddings. By training simple models specifically for retrieval, with an appropriate model architecture, we improve on a discrete baseline by 8% and 26% (MAP) on two similar-question retrieval tasks. We also discuss the problem of evaluation for retrieval systems, and show how to modify existing pairwise similarity datasets for this purpose.

View on arXiv PDF

Similar