LGAIIRSYMLFeb 7, 2023

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

arXiv:2302.03561v310 citationsh-index: 16
AI Analysis

This addresses the challenge of long-term personalization for hundreds of millions of listeners in industrial-scale recommender systems, representing an incremental but practical advance over existing methods.

The paper tackles the problem of optimizing podcast recommendations for long-term user engagement rather than short-term metrics, achieving substantial improvements in A/B tests and reducing data requirements by up to a factor of 120,000 in offline experiments.

We present a novel podcast recommender system deployed at industrial scale. This system successfully optimizes personal listening journeys that unfold over months for hundreds of millions of listeners. In deviating from the pervasive industry practice of optimizing machine learning algorithms for short-term proxy metrics, the system substantially improves long-term performance in A/B tests. The paper offers insights into how our methods cope with attribution, coordination, and measurement challenges that usually hinder such long-term optimization. To contextualize these practical insights within a broader academic framework, we turn to reinforcement learning (RL). Using the language of RL, we formulate a comprehensive model of users' recurring relationships with a recommender system. Then, within this model, we identify our approach as a policy improvement update to a component of the existing recommender system, enhanced by tailored modeling of value functions and user-state representations. Illustrative offline experiments suggest this specialized modeling reduces data requirements by as much as a factor of 120,000 compared to black-box approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes