IR GT LGOct 5, 2025

RLRF: Competitive Search Agent Design via Reinforcement Learning from Ranker Feedback

Tommy Mordo, Sagie Dekel, Omer Madmon, Moshe Tennenholtz, Oren Kurland

arXiv:2510.04096v13.6h-index: 28

Originality Highly original

AI Analysis

This addresses the challenge for search engine publishers and content creators in competitive environments, representing a novel method rather than an incremental improvement.

The paper tackles the problem of optimizing document content for improved search ranking in competitive settings, where publishers use LLMs to modify documents, by introducing Reinforcement Learning from Ranker Feedback (RLRF) to train LLM-based agents, resulting in agents that consistently and substantially outperform previous approaches and adapt to out-of-distribution ranking functions and strategic opponents.

Competitive search is a setting where document publishers modify them to improve their ranking in response to a query. Recently, publishers have increasingly leveraged LLMs to generate and modify competitive content. We introduce Reinforcement Learning from Ranker Feedback (RLRF), a framework that trains LLMs using preference datasets derived from ranking competitions. The goal of a publisher (LLM-based) agent is to optimize content for improved ranking while accounting for the strategies of competing agents. We generate the datasets using approaches that do not rely on human-authored data. We show that our proposed agents consistently and substantially outperform previously suggested approaches for LLM-based competitive document modification. We further show that our agents are effective with ranking functions they were not trained for (i.e., out of distribution) and they adapt to strategic opponents. These findings provide support to the significant potential of using reinforcement learning in competitive search.

View on arXiv PDF

Similar