IR LGApr 9

Efficient Dataset Selection for Continual Adaptation of Generative Recommenders

Cathy Jiao, Juan Elenter, Praveen Ravichandran, Bernd Huber, Joseph Cauteruccio, Todd Wasson, Timothy Heath, Chenyan Xiong, Mounia Lalmas, Paul Bennett

arXiv:2604.0773982.6h-index: 53

Predicted impact top 15% in IR · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses scalability issues in production-scale recommendation systems, though it appears incremental as it builds on existing data selection and representation methods.

The paper tackles the problem of continuous adaptation in recommendation systems to evolving user behavior by using targeted data selection to mitigate performance degradation from temporal drift, achieving training efficiency gains while preserving robustness.

Recommendation systems must continuously adapt to evolving user behavior, yet the volume of data generated in large-scale streaming environments makes frequent full retraining impractical. This work investigates how targeted data selection can mitigate performance degradation caused by temporal distributional drift while maintaining scalability. We evaluate a range of representation choices and sampling strategies for curating small but informative subsets of user interaction data. Our results demonstrate that gradient-based representations, coupled with distribution-matching, improve downstream model performance, achieving training efficiency gains while preserving robustness to drift. These findings highlight data curation as a practical mechanism for scalable monitoring and adaptive model updates in production-scale recommendation systems.

View on arXiv PDF

Similar