LGNov 30, 2020

Deep Controlled Learning for Inventory Control

Tarkan Temizöz, Christina Imdahl, Remco Dijkman, Douniel Lamghari-Idrissi, Willem van Jaarsveld

arXiv:2011.15122v727 citations

AI Analysis

This work addresses the challenge of applying DRL to highly stochastic inventory management problems, providing a robust and generalizable solution for businesses managing inventory.

This paper introduces Deep Controlled Learning (DCL), a new Deep Reinforcement Learning (DRL) algorithm tailored for highly stochastic inventory management problems. DCL consistently outperforms state-of-the-art heuristics and existing DRL algorithms across various inventory settings, achieving lower average costs in all test cases with an optimality gap of no more than 0.2%.

The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field. However, traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management. Consequently, these algorithms often fail to outperform established heuristics; for instance, no existing DRL approach consistently surpasses the capped base-stock policy in lost sales inventory control. This highlights a critical gap in the practical application of DRL to inventory management: the highly stochastic nature of inventory problems requires tailored solutions. In response, we propose Deep Controlled Learning (DCL), a new DRL algorithm designed for highly stochastic problems. DCL is based on approximate policy iteration and incorporates an efficient simulation mechanism, combining Sequential Halving with Common Random Numbers. Our numerical studies demonstrate that DCL consistently outperforms state-of-the-art heuristics and DRL algorithms across various inventory settings, including lost sales, perishable inventory systems, and inventory systems with random lead times. DCL achieves lower average costs in all test cases while maintaining an optimality gap of no more than 0.2\%. Remarkably, this performance is achieved using the same hyperparameter set across all experiments, underscoring the robustness and generalizability of our approach. These findings contribute to the ongoing exploration of tailored DRL algorithms for inventory management, providing a foundation for further research and practical application in this area.

View on arXiv PDF

Similar