LGJun 3, 2024

Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

arXiv:2406.01386v39 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational problem in machine learning by connecting episodic RL and CMAB literature, potentially enabling more interactions between these directions, though it appears incremental as it builds upon existing CMAB works.

The paper tackles the problem of combinatorial multi-armed bandits with multivariant and probabilistically triggering arms, introducing a novel framework (CMAB-MT) that enhances modeling power and leverages statistical properties for improved results, achieving matching or improved regret bounds in applications like episodic reinforcement learning.

We introduce a novel framework of combinatorial multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT), where the outcome of each arm is a $d$-dimensional multivariant random variable and the feedback follows a general arm triggering process. Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables. For CMAB-MT, we propose a general 1-norm multivariant and triggering probability-modulated smoothness condition, and an optimistic CUCB-MT algorithm built upon this condition. Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution, all of which meet the above smoothness condition and achieve matching or improved regret bounds compared to existing works. Through our new framework, we build the first connection between the episodic RL and CMAB literature, by offering a new angle to solve the episodic RL through the lens of CMAB, which may encourage more interactions between these two important directions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes