Improving Collaborative Metric Learning with Efficient Negative Sampling
This work addresses scalability issues in recommendation systems using metric learning, offering an incremental improvement to negative sampling for more efficient training.
The paper tackles the problem of Collaborative Metric Learning (CML) requiring large batches due to inefficient uniform negative sampling, which limits scalability in high-dimensional scenarios. The authors propose a 2-stage negative sampling strategy that enables CML to achieve effective accuracy and reduce popularity bias with significantly smaller batch sizes, demonstrating consistent positive results across various datasets.
Distance metric learning based on triplet loss has been applied with success in a wide range of applications such as face recognition, image retrieval, speaker change detection and recently recommendation with the CML model. However, as we show in this article, CML requires large batches to work reasonably well because of a too simplistic uniform negative sampling strategy for selecting triplets. Due to memory limitations, this makes it difficult to scale in high-dimensional scenarios. To alleviate this problem, we propose here a 2-stage negative sampling strategy which finds triplets that are highly informative for learning. Our strategy allows CML to work effectively in terms of accuracy and popularity bias, even when the batch size is an order of magnitude smaller than what would be needed with the default uniform sampling. We demonstrate the suitability of the proposed strategy for recommendation and exhibit consistent positive results across various datasets.