IRAIAug 12, 2025

SPARC: Soft Probabilistic Adaptive multi-interest Retrieval Model via Codebooks for recommender system

arXiv:2508.09090v2Has Code
Originality Highly original
AI Analysis

This addresses the challenge of adapting to real-time user preferences and exploring novel interests in large-scale industrial recommender systems, representing a significant but incremental advance over existing methods.

The paper tackled the problem of static and over-exploited multi-interest modeling in recommender systems by proposing SPARC, a framework that dynamically evolves user interests and enables proactive exploration, resulting in online gains including a +0.9% increase in user view duration and a +22.7% improvement in new content reach.

Modeling multi-interests has arisen as a core problem in real-world RS. Current multi-interest retrieval methods pose three major challenges: 1) Interests, typically extracted from predefined external knowledge, are invariant. Failed to dynamically evolve with users' real-time consumption preferences. 2) Online inference typically employs an over-exploited strategy, mainly matching users' existing interests, lacking proactive exploration and discovery of novel and long-tail interests. To address these challenges, we propose a novel retrieval framework named SPARC(Soft Probabilistic Adaptive Retrieval Model via Codebooks). Our contribution is two folds. First, the framework utilizes Residual Quantized Variational Autoencoder (RQ-VAE) to construct a discretized interest space. It achieves joint training of the RQ-VAE with the industrial large scale recommendation model, mining behavior-aware interests that can perceive user feedback and evolve dynamically. Secondly, a probabilistic interest module that predicts the probability distribution over the entire dynamic and discrete interest space. This facilitates an efficient "soft-search" strategy during online inference, revolutionizing the retrieval paradigm from "passive matching" to "proactive exploration" and thereby effectively promoting interest discovery. Online A/B tests on an industrial platform with tens of millions daily active users, have achieved substantial gains in business metrics: +0.9% increase in user view duration, +0.4% increase in user page views (PV), and a +22.7% improvement in PV500(new content reaching 500 PVs in 24 hours). Offline evaluations are conducted on open-source Amazon Product datasets. Metrics, such as Recall@K and Normalized Discounted Cumulative Gain@K(NDCG@K), also showed consistent improvement. Both online and offline experiments validate the efficacy and practical value of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes