SY LGJun 21, 2019

Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning

arXiv:1906.09035v11.22 citations

Originality Incremental advance

AI Analysis

This addresses a fundamental problem in reinforcement learning for scenarios with both system noise and parameter uncertainty, though it appears incremental as it builds on existing decomposition techniques.

The paper tackles the challenge of Bayesian reinforcement learning under non-episodic conditions by proposing a two-layer solution scheme that directly approximates the optimal policy, demonstrating it on a linear-quadratic-Gaussian problem with unknown gain.

Stochastic control with both inherent random system noise and lack of knowledge on system parameters constitutes the core and fundamental topic in reinforcement learning (RL), especially under non-episodic situations where online learning is much more demanding. This challenge has been notably addressed in Bayesian RL recently where some approximation techniques have been developed to find suboptimal policies. While existing approaches mainly focus on approximating the value function, or on involving Thompson sampling, we propose a novel two-layer solution scheme in this paper to approximate the optimal policy directly, by combining the time-decomposition based dynamic programming (DP) at the lower layer and the scenario-decomposition based revised progressive hedging algorithm (PHA) at the upper layer, for a type of Bayesian RL problem. The key feature of our approach is to separate reducible system uncertainty from irreducible one at two different layers, thus decomposing and conquering. We demonstrate our solution framework more especially via the linear-quadratic-Gaussian problem with unknown gain, which, although seemingly simple, has been a notorious subject over more than half century in dual control.

View on arXiv PDF

Similar