IR CLMar 31, 2025

Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning

arXiv:2503.24289v329.332 citationsh-index: 3Has CodeTrans. Mach. Learn. Res.

Originality Highly original

AI Analysis

This work addresses the challenge of adapting LLMs for user-centric recommendation tasks without impairing their general capabilities, offering a more efficient alternative to data-intensive methods.

The authors tackled the problem of integrating large language models (LLMs) with recommendation systems by proposing Rec-R1, a reinforcement learning framework that optimizes LLM generation using feedback from a fixed recommendation model, avoiding costly synthetic data. Experimental results on product search and sequential recommendation tasks show Rec-R1 consistently outperforms prompting- and supervised fine-tuning-based methods, achieving significant gains over strong discriminative baselines even with simple retrievers like BM25.

We propose Rec-R1, a general reinforcement learning framework that bridges large language models (LLMs) with recommendation systems through closed-loop optimization. Unlike prompting and supervised fine-tuning (SFT), Rec-R1 directly optimizes LLM generation using feedback from a fixed black-box recommendation model, without relying on synthetic SFT data from proprietary models such as GPT-4o. This avoids the substantial cost and effort required for data distillation. To verify the effectiveness of Rec-R1, we evaluate it on two representative tasks: product search and sequential recommendation. Experimental results demonstrate that Rec-R1 not only consistently outperforms prompting- and SFT-based methods, but also achieves significant gains over strong discriminative baselines, even when used with simple retrievers such as BM25. Moreover, Rec-R1 preserves the general-purpose capabilities of the LLM, unlike SFT, which often impairs instruction-following and reasoning. These findings suggest Rec-R1 as a promising foundation for continual task-specific adaptation without catastrophic forgetting.

View on arXiv PDF Code

Similar