OCLGJun 29, 2021

Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces

arXiv:2106.15594v11 citations
Originality Incremental advance
AI Analysis

This work addresses optimal control problems for continuous action spaces, presenting an incremental improvement in efficiency over existing methods.

The paper tackles optimal control in continuous action spaces by proposing LD-HOO, a limited-depth variant of hierarchical optimistic optimization, which achieves the same asymptotic cumulative regret as HOO while being faster and more memory efficient, and extends it to Monte Carlo tree search for optimal control problems.

This paper addresses the problem of optimal control using search trees. We start by considering multi-armed bandit problems with continuous action spaces and propose LD-HOO, a limited depth variant of the hierarchical optimistic optimization (HOO) algorithm. We provide a regret analysis for LD-HOO and show that, asymptotically, our algorithm exhibits the same cumulative regret as the original HOO while being faster and more memory efficient. We then propose a Monte Carlo tree search algorithm based on LD-HOO for optimal control problems and illustrate the resulting approach's application in several optimal control problems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes