Active Sampling of Pairs and Points for Large-scale Linear Bipartite Ranking
This work addresses the scalability problem in bipartite ranking for machine learning practitioners dealing with large datasets, offering a novel hybrid solution that improves upon existing methods.
The paper tackles the efficiency-accuracy trade-off in large-scale bipartite ranking by introducing an active sampling scheme within a pair-wise approach and a Combined Ranking and Classification (CRC) framework, achieving superior accuracy and efficiency on 14 real-world datasets compared to state-of-the-art methods.
Bipartite ranking is a fundamental ranking problem that learns to order relevant instances ahead of irrelevant ones. The pair-wise approach for bi-partite ranking construct a quadratic number of pairs to solve the problem, which is infeasible for large-scale data sets. The point-wise approach, albeit more efficient, often results in inferior performance. That is, it is difficult to conduct bipartite ranking accurately and efficiently at the same time. In this paper, we develop a novel active sampling scheme within the pair-wise approach to conduct bipartite ranking efficiently. The scheme is inspired from active learning and can reach a competitive ranking performance while focusing only on a small subset of the many pairs during training. Moreover, we propose a general Combined Ranking and Classification (CRC) framework to accurately conduct bipartite ranking. The framework unifies point-wise and pair-wise approaches and is simply based on the idea of treating each instance point as a pseudo-pair. Experiments on 14 real-word large-scale data sets demonstrate that the proposed algorithm of Active Sampling within CRC, when coupled with a linear Support Vector Machine, usually outperforms state-of-the-art point-wise and pair-wise ranking approaches in terms of both accuracy and efficiency.