Relevance Proximity Graphs for Fast Relevance Retrieval
This addresses the efficiency bottleneck in relevance retrieval for machine learning applications, though it is incremental as it extends an existing similarity graphs framework.
The paper tackles the problem of efficiently retrieving the most relevant items using computationally expensive nonlinear models like DNNs or GBDTs, by introducing Relevance Proximity Graphs (RPG), which achieve high retrieval accuracy with only a few model computations, outperforming indirect models in experiments.
In plenty of machine learning applications, the most relevant items for a particular query should be efficiently extracted, while the relevance function is based on a highly-nonlinear model, e.g., DNNs or GBDTs. Due to the high computational complexity of such models, exhaustive search is infeasible even for medium-scale problems. To address this issue, we introduce Relevance Proximity Graphs (RPG): an efficient non-exhaustive approach that provides a high-quality approximate solution for maximal relevance retrieval. Namely, we extend the recent similarity graphs framework to the setting, when there is no similarity measure defined on item pairs, which is a common practical use-case. By design, our approach directly maximizes off-the-shelf relevance functions and does not require any proxy auxiliary models. Via extensive experiments, we show that the developed method provides excellent retrieval accuracy while requiring only a few model computations, outperforming indirect models. We open-source our implementation as well as two large-scale datasets to support further research on relevance retrieval.