LG AIFeb 6

DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling

Bucher Sahyouni, Matthew Vowels, Liqun Chen, Simon Hadfield

arXiv:2602.07206v11.4h-index: 1

Originality Highly original

AI Analysis

This addresses performance and robustness issues in recommender systems for users and platforms, offering a novel method that is incremental but with strong gains.

The paper tackled the brittleness of Softmax Loss in recommender systems due to uniform negative sampling and a global temperature, by introducing Dual-scale Softmax Loss (DSL) that adapts sharpness based on sampled competition, resulting in average improvements of 6.22% over SL across datasets and up to 9.31% under out-of-distribution shifts.

Softmax Loss (SL) is being increasingly adopted for recommender systems (RS) as it has demonstrated better performance, robustness and fairness. Yet in implicit-feedback, a single global temperature and equal treatment of uniformly sampled negatives can lead to brittle training, because sampled sets may contain varying degrees of relevant or informative competitors. The optimal loss sharpness for a user-item pair with a particular set of negatives, can be suboptimal or destabilising for another with different negatives. We introduce Dual-scale Softmax Loss (DSL), which infers effective sharpness from the sampled competition itself. DSL adds two complementary branches to the log-sum-exp backbone. Firstly it reweights negatives within each training instance using hardness and item--item similarity, secondly it adapts a per-example temperature from the competition intensity over a constructed competitor slate. Together, these components preserve the geometry of SL while reshaping the competition distribution across negatives and across examples. Over several representative benchmarks and backbones, DSL yields substantial gains over strong baselines, with improvements over SL exceeding $10%$ in several settings and averaging $6.22%$ across datasets, metrics, and backbones. Under out-of-distribution (OOD) popularity shift, the gains are larger, with an average of $9.31%$ improvement over SL. We further provide a theoretical, distributionally robust optimisation (DRO) analysis, which demonstrates how DSL reshapes the robust payoff and the KL deviation for ambiguous instances. This helps explain the empirically observed improvements in accuracy and robustness.

View on arXiv PDF

Similar