LGFeb 6, 2025

Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

Dimitris Bertsimas, Cheol Woo Kim, José Niño-Mora

arXiv:2502.03725v14.1h-index: 23

Originality Incremental advance

AI Analysis

This provides a scalable solution for dynamic optimization problems like machine maintenance and epidemic control, though it is incremental as it builds on existing fluid bandit frameworks.

The paper tackled the optimal control of fluid restless multi-armed bandits by proposing a machine learning algorithm that learns state feedback policies, achieving a speed-up of up to 26 million times compared to direct numerical methods.

We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) with state equations that are either affine or quadratic in the state variables. By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Using this algorithm, we solve multiple instances with varying initial states to generate a comprehensive training set. We then learn a state feedback policy using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control and fisheries control problems. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.

View on arXiv PDF

Similar