LGOct 10, 2021

A Closer Look at Advantage-Filtered Behavioral Cloning in High-Noise Datasets

arXiv:2110.04698v25 citations
AI Analysis

This addresses the challenge of learning from high-noise datasets in offline reinforcement learning, which is incremental as it builds on existing methods.

The paper tackled the problem of scaling offline reinforcement learning to vast datasets dominated by sub-optimal noise, and the result was that a modified prioritized experience sampling method enabled agents to learn state-of-the-art policies even when expert actions were outnumbered 65:1.

Recent Offline Reinforcement Learning methods have succeeded in learning high-performance policies from fixed datasets of experience. A particularly effective approach learns to first identify and then mimic optimal decision-making strategies. Our work evaluates this method's ability to scale to vast datasets consisting almost entirely of sub-optimal noise. A thorough investigation on a custom benchmark helps identify several key challenges involved in learning from high-noise datasets. We re-purpose prioritized experience sampling to locate expert-level demonstrations among millions of low-performance samples. This modification enables offline agents to learn state-of-the-art policies in benchmark tasks using datasets where expert actions are outnumbered nearly 65:1.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes