Achieving Stable Training of Reinforcement Learning Agents in Bimodal Environments through Batch Learning
This work addresses a common problem in real-world applications like pricing, enabling more practical industrial deployment of reinforcement learning, though it appears incremental as it modifies an existing algorithm for a specific bottleneck.
The paper tackled the challenge of training reinforcement learning agents in bimodal, stochastic environments, such as pricing problems, by introducing a batch learning approach to tabular Q-learning. The batch learning agents were shown to be more effective and resilient, with concrete improvements in stability and performance compared to typically-trained agents.
Bimodal, stochastic environments present a challenge to typical Reinforcement Learning problems. This problem is one that is surprisingly common in real world applications, being particularly applicable to pricing problems. In this paper we present a novel learning approach to the tabular Q-learning algorithm, tailored to tackling these specific challenges by using batch updates. A simulation of pricing problem is used as a testbed to compare a typically updated agent with a batch learning agent. The batch learning agents are shown to be both more effective than the typically-trained agents, and to be more resilient to the fluctuations in a large stochastic environment. This work has a significant potential to enable practical, industrial deployment of Reinforcement Learning in the context of pricing and others.