GTAISYNov 9, 2025

LLM-Guided Reinforcement Learning with Representative Agents for Traffic Modeling

arXiv:2511.06260v11 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses scalability and interpretability problems for researchers and practitioners in traffic modeling, though it is incremental as it builds on existing LLM-based methods with a novel grouping and update mechanism.

The paper tackles the scalability and instability issues of using large language models (LLMs) as behavioral proxies in traffic modeling by proposing a representative-agent approach that groups homogeneous travelers and uses a single LLM per group, combined with an interpretable update rule. The result is rapid convergence to user equilibrium in classic settings and stable, interpretable dynamics in richer scenarios, reproducing behavioral patterns like the decoy effect and income-based willingness-to-pay.

Large language models (LLMs) are increasingly used as behavioral proxies for self-interested travelers in agent-based traffic models. Although more flexible and generalizable than conventional models, the practical use of these approaches remains limited by scalability due to the cost of calling one LLM for every traveler. Moreover, it has been found that LLM agents often make opaque choices and produce unstable day-to-day dynamics. To address these challenges, we propose to model each homogeneous traveler group facing the same decision context with a single representative LLM agent who behaves like the population's average, maintaining and updating a mixed strategy over routes that coincides with the group's aggregate flow proportions. Each day, the LLM reviews the travel experience and flags routes with positive reinforcement that they hope to use more often, and an interpretable update rule then converts this judgment into strategy adjustments using a tunable (progressively decaying) step size. The representative-agent design improves scalability, while the separation of reasoning from updating clarifies the decision logic while stabilizing learning. In classic traffic assignment settings, we find that the proposed approach converges rapidly to the user equilibrium. In richer settings with income heterogeneity, multi-criteria costs, and multi-modal choices, the generated dynamics remain stable and interpretable, reproducing plausible behavioral patterns well-documented in psychology and economics, for example, the decoy effect in toll versus non-toll road selection, and higher willingness-to-pay for convenience among higher-income travelers when choosing between driving, transit, and park-and-ride options.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes