LGMLSep 23, 2019

PAC Reinforcement Learning without Real-World Feedback

arXiv:1909.10449v311 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of reducing real-world sample complexity for reinforcement learning agents in scenarios where feedback is unavailable, which is incremental as it builds on existing ROMDP theory.

The paper tackles the problem of reinforcement learning in Sim-to-Real settings without real-world feedback by formulating a theoretical framework using rich observation Markov decision processes (ROMDPs), and establishes real-world sample complexity guarantees that are smaller than known methods for directly learning ROMDPs with feedback.

This work studies reinforcement learning in the Sim-to-Real setting, in which an agent is first trained on a number of simulators before being deployed in the real world, with the aim of decreasing the real-world sample complexity requirement. Using a dynamic model known as a rich observation Markov decision process (ROMDP), we formulate a theoretical framework for Sim-to-Real in the situation where feedback in the real world is not available. We establish real-world sample complexity guarantees that are smaller than what is currently known for directly (i.e., without access to simulators) learning a ROMDP with feedback.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes