LGAIMLJun 28, 2022

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

arXiv:2206.13714v35 citationsh-index: 57Has Code
Originality Incremental advance
AI Analysis

This addresses data efficiency and performance guarantees for real-world control applications, but appears incremental as it builds on existing on-policy and sample reuse methods.

The paper tackled the trade-off between practical performance guarantees and data efficiency in deep reinforcement learning for control by developing Generalized Policy Improvement algorithms, which combine on-policy guarantees with sample reuse, and demonstrated benefits through extensive experiments on simulated tasks.

We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control. Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse, addressing a trade-off between two important deployment requirements for real-world control: (i) practical performance guarantees and (ii) data efficiency. We demonstrate the benefits of this new class of algorithms through extensive experimental analysis on a broad range of simulated control tasks.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes