IR AIMar 23

AgenticRec: End-to-End Tool-Integrated Policy Optimization for Ranking-Oriented Recommender Agents

Tianyi Li, Zixuan Wang, Guidong Lei, Xiaodong Li, Hui Li

arXiv:2603.2161375.5h-index: 1

AI Analysis

This work addresses a specific bottleneck in ranking-oriented recommender agents for recommendation systems, offering incremental advancements in tool integration and optimization.

The paper tackled the problem of recommender agents based on Large Language Models suffering from a disconnect between reasoning and ranking feedback, and inability to capture fine-grained preferences, by proposing AgenticRec, which integrates tools, optimizes decision-making trajectories, and refines preferences, resulting in significant performance improvements over baselines in experiments.

Recommender agents built on Large Language Models offer a promising paradigm for recommendation. However, existing recommender agents typically suffer from a disconnect between intermediate reasoning and final ranking feedback, and are unable to capture fine-grained preferences. To address this, we present AgenticRec, a ranking-oriented agentic recommendation framework that optimizes the entire decision-making trajectory (including intermediate reasoning, tool invocation, and final ranking list generation) under sparse implicit feedback. Our approach makes three key contributions. First, we design a suite of recommendation-specific tools integrated into a ReAct loop to support evidence-grounded reasoning. Second, we propose theoretically unbiased List-Wise Group Relative Policy Optimization (list-wise GRPO) to maximize ranking utility, ensuring accurate credit assignment for complex tool-use trajectories. Third, we introduce Progressive Preference Refinement (PPR) to resolve fine-grained preference ambiguities. By mining hard negatives from ranking violations and applying bidirectional preference alignment, PPR minimizes the convex upper bound of pairwise ranking errors. Experiments on benchmarks confirm that AgenticRec significantly outperforms baselines, validating the necessity of unifying reasoning, tool use, and ranking optimization.

View on arXiv PDF

Similar