LG AINov 21, 2017

Posterior Sampling for Large Scale Reinforcement Learning

Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos Vlassis

arXiv:1711.07979v310.625 citations

Originality Incremental advance

AI Analysis

This work addresses scalability issues in reinforcement learning for applications like sequential recommendations, but it is incremental as it builds on existing PSRL methods with a modified schedule.

The authors tackled the challenge of scaling posterior sampling for reinforcement learning (PSRL) by proposing DS-PSRL, a non-episodic algorithm with a deterministic switching schedule, which achieved efficiency in time, sample, and space complexity and was validated on standard discrete and continuous problems.

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.

View on arXiv PDF

Similar