IR AIOct 27, 2025

Think before Recommendation: Autonomous Reasoning-enhanced Recommender

Xiaoyu Kong, Junguang Jiang, Bin Liu, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng, Jiancan Wu, Xiang Wang

arXiv:2510.23077v13 citationsh-index: 24

Originality Incremental advance

AI Analysis

This addresses the problem of enhancing recommender systems with reasoning capabilities for users and developers, though it is incremental as it builds on existing LLM and RL methods.

The paper tackles limitations in using large language models for recommendation by proposing RecZero, a reinforcement learning-based paradigm that trains a single model to autonomously develop reasoning for rating prediction, and RecOne, a hybrid approach; experimental results show they significantly outperform existing baselines on multiple datasets.

The core task of recommender systems is to learn user preferences from historical user-item interactions. With the rapid development of large language models (LLMs), recent research has explored leveraging the reasoning capabilities of LLMs to enhance rating prediction tasks. However, existing distillation-based methods suffer from limitations such as the teacher model's insufficient recommendation capability, costly and static supervision, and superficial transfer of reasoning ability. To address these issues, this paper proposes RecZero, a reinforcement learning (RL)-based recommendation paradigm that abandons the traditional multi-model and multi-stage distillation approach. Instead, RecZero trains a single LLM through pure RL to autonomously develop reasoning capabilities for rating prediction. RecZero consists of two key components: (1) "Think-before-Recommendation" prompt construction, which employs a structured reasoning template to guide the model in step-wise analysis of user interests, item features, and user-item compatibility; and (2) rule-based reward modeling, which adopts group relative policy optimization (GRPO) to compute rewards for reasoning trajectories and optimize the LLM. Additionally, the paper explores a hybrid paradigm, RecOne, which combines supervised fine-tuning with RL, initializing the model with cold-start reasoning samples and further optimizing it with RL. Experimental results demonstrate that RecZero and RecOne significantly outperform existing baseline methods on multiple benchmark datasets, validating the superiority of the RL paradigm in achieving autonomous reasoning-enhanced recommender systems.

View on arXiv PDF

Similar