LG IRDec 28, 2024

Generative Regression Based Watch Time Prediction for Short-Video Recommendation

Hongxu Ma, Kai Tian, Tao Zhang, Xuefeng Zhang, Han Zhou, Chunjie Chen, Han Li, Jihong Guan, Shuigeng Zhou

arXiv:2412.20211v312.57 citationsh-index: 51

Originality Incremental advance

AI Analysis

This work improves short-video recommendation systems by enhancing user engagement prediction, though it is incremental as it builds on existing ordinal regression methods.

The paper tackles the problem of watch time prediction in short-video recommendation by addressing challenges like wide value ranges and imbalanced data, proposing a Generative Regression framework that reformulates it as a sequence generation task. The results show that GR significantly outperforms state-of-the-art methods on public and industrial datasets, with online A/B testing on the Kuaishou App confirming its real-world effectiveness.

Watch time prediction (WTP) has emerged as a pivotal task in short video recommendation systems, designed to quantify user engagement through continuous interaction modeling. Predicting users' watch times on videos often encounters fundamental challenges, including wide value ranges and imbalanced data distributions, which can lead to significant estimation bias when directly applying regression techniques. Recent studies have attempted to address these issues by converting the continuous watch time estimation into an ordinal regression task. While these methods demonstrate partial effectiveness, they exhibit notable limitations: (1) the discretization process frequently relies on bucket partitioning, inherently reducing prediction flexibility and accuracy and (2) the interdependencies among different partition intervals remain underutilized, missing opportunities for effective error correction. Inspired by language modeling paradigms, we propose a novel Generative Regression (GR) framework that reformulates WTP as a sequence generation task. Our approach employs \textit{structural discretization} to enable nearly lossless value reconstruction while maintaining prediction fidelity. Through carefully designed vocabulary construction and label encoding schemes, each watch time is bijectively mapped to a token sequence. To mitigate the training-inference discrepancy caused by teacher-forcing, we introduce a \textit{curriculum learning with embedding mixup} strategy that gradually transitions from guided to free-generation modes. We evaluate our method against state-of-the-art approaches on two public datasets and one industrial dataset. We also perform online A/B testing on the Kuaishou App to confirm the real-world effectiveness. The results conclusively show that GR outperforms existing techniques significantly.

View on arXiv PDF

Similar