MLLGJun 1

Doing well with less! On Sampling Techniques for Empirical Pairwise Loss Estimation/Minimization

arXiv:2606.0234527.4
AI Analysis

For practitioners in similarity learning, ranking, and clustering, this provides a principled way to scale pairwise loss methods to large datasets without sacrificing accuracy.

This paper shows that using survey sampling techniques to retain only a fraction of pairs can achieve comparable performance to full pairwise evaluation in empirical pairwise loss minimization, with a key finding that sampling must target pairs directly. Experiments demonstrate that informative pair sampling yields near-full performance while reducing computational cost.

Many machine learning problems, including similarity learning, ranking, and clustering, rely on empirical pairwise loss functions whose quadratic computational cost quickly becomes prohibitive at scale. We demonstrate how a frugal approach that retains only a fraction of the available information on pairs can achieve estimation or optimization performance comparable to that obtained by using all pairs, by leveraging survey sampling techniques. A central finding, supported by both theory and experiments, is that such sampling plans must target pairs directly rather than individual observations. In particular, for pairwise losses between high-dimensional vectors such as embeddings in vision or graph learning, assigning higher inclusion probabilities to informative pairs using suitable auxiliary information yields performance close to full pairwise evaluation, providing a principled and theoretically grounded trade-off between accuracy and computational cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes