MLLGSep 30, 2015

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

arXiv:1509.09011v123 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of sequential learning with limited feedback in stochastic environments, providing optimal algorithms for researchers and practitioners in reinforcement learning and decision-making, though it is incremental as it builds on existing bandit algorithms.

The paper tackles the problem of partial monitoring with finite actions and stochastic outcomes by deriving a logarithmic distribution-dependent regret lower bound and proposing PM-DMED, an algorithm that minimizes this regret and outperforms state-of-the-art methods in experiments, with a modified version achieving an asymptotically optimal regret upper bound matching the lower bound.

Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes