LGJun 4, 2025

Selective Matching Losses -- Not All Scores Are Created Equal

arXiv:2506.04446v2h-index: 13
Originality Incremental advance
AI Analysis

This addresses the need for more precise control over prediction accuracy in critical regions for applications such as retrieval and ranking, though it is incremental as it builds on existing loss function frameworks.

The paper tackles the problem of learning systems needing accurate predictions in specific subsets of the domain by constructing selective matching loss functions that emphasize high-importance regions, resulting in substantial advantages over traditional losses in applications like dwell-time prediction and LLM fine-tuning.

Learning systems match predicted scores to observations over some domain. Often, it is critical to produce accurate predictions in some subset (or region) of the domain, yet less important to accurately predict in other regions. We construct selective matching loss functions by design of increasing link functions over score domains. A matching loss is an integral over the link. A link defines loss sensitivity as function of the score, emphasizing high slope high sensitivity regions over flat ones. Loss asymmetry drives a model and resolves its underspecification to predict better in high sensitivity regions where it is more important, and to distinguish between high and low importance regions. A large variety of selective scalar losses can be designed with scaled and shifted Sigmoid and hyperbolic sine links. Their properties, however, do not extend to multi-class. Applying them per dimension lacks ranking sensitivity that assigns importance according to class score ranking. Utilizing composite Softmax functions, we develop a framework for multidimensional selective losses. We overcome limitations of the standard Softmax function, that is good for classification, but not for distinction between adjacent scores. Selective losses have substantial advantage over traditional losses in applications with more important score regions, including dwell-time prediction, retrieval, ranking with either pointwise, contrastive pairwise, or listwise losses, distillation problems, and fine-tuning alignment of Large Language Models (LLMs).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes