ROLGJan 17, 2023

Sim-Anchored Learning for On-the-Fly Adaptation

arXiv:2301.06987v3h-index: 24Has Code
Originality Highly original
AI Analysis

This addresses the challenge of catastrophic forgetting in underrepresented scenarios during sim-to-real adaptation for robotic platforms, offering a solution for maintaining designer intent in live adaptation.

The paper tackles the problem of fine-tuning simulation-trained RL agents with real-world data, which often degrades crucial behaviors due to limited or skewed data distributions, and proposes a multi-objective optimization approach using anchor critics to preserve prioritized behaviors, resulting in robust adaptation with up to 50% power consumption reduction in a racing quadrotor scenario without control loss.

Fine-tuning simulation-trained RL agents with real-world data often degrades crucial behaviors due to limited or skewed data distributions. We argue that designer priorities exist not just in reward functions, but also in simulation design choices like task selection and state initialization. When adapting to real-world data, agents can experience catastrophic forgetting in important but underrepresented scenarios. We propose framing live-adaptation as a multi-objective optimization problem, where policy objectives must be satisfied both in simulation and reality. Our approach leverages critics from simulation as "anchors for design intent" (anchor critics). By jointly optimizing policies against both anchor critics and critics trained on real-world experience, our method enables adaptation while preserving prioritized behaviors from simulation. Evaluations demonstrate robust behavior retention in sim-to-sim benchmarks and a sim-to-real scenario with a racing quadrotor, allowing for power consumption reductions of up to 50% without control loss. We also contribute SwaNNFlight, an open-source firmware for enabling live adaptation on similar robotic platforms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes