LGAINov 21, 2017

Posterior Sampling for Large Scale Reinforcement Learning

arXiv:1711.07979v325 citations
Originality Incremental advance
AI Analysis

This work addresses scalability issues in reinforcement learning for applications like sequential recommendations, but it is incremental as it builds on existing PSRL methods with a modified schedule.

The authors tackled the challenge of scaling posterior sampling for reinforcement learning (PSRL) by proposing DS-PSRL, a non-episodic algorithm with a deterministic switching schedule, which achieved efficiency in time, sample, and space complexity and was validated on standard discrete and continuous problems.

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes