SY LG MLSep 24, 2024

Neural Coordination and Capacity Control for Inventory Management

Carson Eisenach, Udaya Ghai, Dhruv Madeka, Kari Torkkola, Dean Foster, Sham Kakade

arXiv:2410.02817v15.94 citationsh-index: 96

Originality Incremental advance

AI Analysis

This work addresses inventory management challenges for retailers with capacity constraints, representing an incremental improvement by extending existing formulations and methods.

This paper tackles the capacitated periodic review inventory control problem for retailers managing multiple products with limited shared resources, proposing a neural coordinator to guide capacity adherence and showing that deep reinforcement learning policies with this coordinator outperform classic baselines by up to 50% in cumulative discounted reward and capacity adherence.

This paper addresses the capacitated periodic review inventory control problem, focusing on a retailer managing multiple products with limited shared resources, such as storage or inbound labor at a facility. Specifically, this paper is motivated by the questions of (1) what does it mean to backtest a capacity control mechanism, (2) can we devise and backtest a capacity control mechanism that is compatible with recent advances in deep reinforcement learning for inventory management? First, because we only have a single historic sample path of Amazon's capacity limits, we propose a method that samples from a distribution of possible constraint paths covering a space of real-world scenarios. This novel approach allows for more robust and realistic testing of inventory management strategies. Second, we extend the exo-IDP (Exogenous Decision Process) formulation of Madeka et al. 2022 to capacitated periodic review inventory control problems and show that certain capacitated control problems are no harder than supervised learning. Third, we introduce a `neural coordinator', designed to produce forecasts of capacity prices, guiding the system to adhere to target constraints in place of a traditional model predictive controller. Finally, we apply a modified DirectBackprop algorithm for learning a deep RL buying policy and a training the neural coordinator. Our methodology is evaluated through large-scale backtests, demonstrating RL buying policies with a neural coordinator outperforms classic baselines both in terms of cumulative discounted reward and capacity adherence (we see improvements of up to 50% in some cases).

View on arXiv PDF

Similar