Graph-Attentive MAPPO for Dynamic Retail Pricing
This work addresses the problem of scalable and stable price optimization for multi-product retail decision-making, though it is incremental as it builds on existing MARL methods with graph attention.
The study tackled dynamic pricing in retail by comparing a multi-agent reinforcement learning baseline (MAPPO) with a graph-attention-augmented variant (MAPPO+GAT) in a simulated environment based on real data, finding that MAPPO+GAT enhanced performance by sharing information across products without increasing price volatility.
Dynamic pricing in retail requires policies that adapt to shifting demand while coordinating decisions across related products. We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization, comparing a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT) that leverages learned interactions among products. Using a simulated pricing environment derived from real transaction data, we evaluate profit, stability across random seeds, fairness across products, and training efficiency under a standardized evaluation protocol. The results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control, and that MAPPO+GAT further enhances performance by sharing information over the product graph without inducing excessive price volatility. These results indicate that graph-integrated MARL provides a more scalable and stable solution than independent learners for dynamic retail pricing, offering practical advantages in multi-product decision-making.