Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability,Stability and Fairness
This work addresses dynamic pricing for retail markets, offering incremental improvements in stability and fairness over existing methods.
The paper tackled dynamic pricing in competitive retail markets by evaluating multi-agent reinforcement learning (MARL) methods like MAPPO and MADDPG against a baseline, finding that MAPPO achieved the highest average returns with low variance, while MADDPG provided the fairest profit distribution.
Dynamic pricing in competitive retail markets requires strategies that adapt to fluctuating demand and competitor behavior. In this work, we present a systematic empirical evaluation of multi-agent reinforcement learning (MARL) approaches-specifically MAPPO and MADDPG-for dynamic price optimization under competition. Using a simulated marketplace environment derived from real-world retail data, we benchmark these algorithms against an Independent DDPG (IDDPG) baseline, a widely used independent learner in MARL literature. We evaluate profit performance, stability across random seeds, fairness, and training efficiency. Our results show that MAPPO consistently achieves the highest average returns with low variance, offering a stable and reproducible approach for competitive price optimization, while MADDPG achieves slightly lower profit but the fairest profit distribution among agents. These findings demonstrate that MARL methods-particularly MAPPO-provide a scalable and stable alternative to independent learning approaches for dynamic retail pricing.