CLOct 8, 2025

Customer-R1: Personalized Simulation of Human Behaviors via RL-based LLM Agent in Online Shopping

arXiv:2510.07230v213 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the need for personalized behavior simulation in online shopping, offering an incremental improvement over existing population-level methods.

The paper tackled the problem of simulating personalized human behaviors in online shopping by introducing Customer-R1, an RL-based LLM agent conditioned on user personas, which significantly outperformed baselines in next-action prediction and better matched user action distributions.

Simulating step-wise human behavior with Large Language Models (LLMs) has become an emerging research direction, enabling applications in various practical domains. While prior methods, including prompting, supervised fine-tuning (SFT), and reinforcement learning (RL), have shown promise in modeling step-wise behavior, they primarily learn a population-level policy without conditioning on a user's persona, yielding generic rather than personalized simulations. In this work, we pose a critical question: how can LLM agents better simulate personalized user behavior? We introduce Customer-R1, an RL-based method for personalized, step-wise user behavior simulation in online shopping environments. Our policy is conditioned on an explicit persona, and we optimize next-step rationale and action generation via action correctness reward signals. Experiments on the OPeRA dataset emonstrate that Customer-R1 not only significantly outperforms prompting and SFT-based baselines in next-action prediction tasks, but also better matches users' action distribution, indicating higher fidelity in personalized behavior simulation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes