A Model-based Multi-Agent Personalized Short-Video Recommender System
This addresses the challenge of personalized session-based recommendations for users in industrial short-video platforms, though it appears incremental as it builds on existing RL approaches.
The paper tackles the problem of maximizing user watch-time in short-video recommendation sessions by formulating it as a Markov decision process and solving it with a reinforcement learning framework, achieving deployment success in a large-scale platform serving over hundreds of millions of users.
Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.