LG OC PR MLNov 27, 2025

Representative Action Selection for Large Action Space: From Bandits to MDPs

arXiv:2511.22104v14.1

Originality Incremental advance

AI Analysis

This work addresses a fundamental challenge in large-scale combinatorial decision-making under uncertainty, with applications in domains like inventory management and recommendation systems, but it is incremental as it extends prior results from bandits to MDPs.

The paper tackles the problem of selecting a small, representative action subset from a large action space in reinforcement learning environments, such as inventory management and recommendation systems, to enable efficient learning without evaluating all actions. It extends prior meta-bandit results to Markov Decision Processes, proving that their algorithm achieves performance comparable to using the full action space under a relaxed sub-Gaussian process model.

We study the problem of selecting a small, representative action subset from an extremely large action space shared across a family of reinforcement learning (RL) environments -- a fundamental challenge in applications like inventory management and recommendation systems, where direct learning over the entire space is intractable. Our goal is to identify a fixed subset of actions that, for every environment in the family, contains a near-optimal action, thereby enabling efficient learning without exhaustively evaluating all actions. This work extends our prior results for meta-bandits to the more general setting of Markov Decision Processes (MDPs). We prove that our existing algorithm achieves performance comparable to using the full action space. This theoretical guarantee is established under a relaxed, non-centered sub-Gaussian process model, which accommodates greater environmental heterogeneity. Consequently, our approach provides a computationally and sample-efficient solution for large-scale combinatorial decision-making under uncertainty.

View on arXiv PDF

Similar