LGAIMar 8

ProgAgent:A Continual RL Agent with Progress-Aware Rewards

arXiv:2603.07784v1
Predicted impact top 37% in LG · last 90 daysOriginality Highly original
AI Analysis

This work provides a novel approach to improve continual reinforcement learning for lifelong robotic learning by tackling catastrophic forgetting and the high cost of reward specification.

ProgAgent is a continual reinforcement learning agent that addresses catastrophic forgetting and high reward specification costs by learning dense, shaped rewards from unlabeled expert videos. It significantly reduces forgetting, boosts learning speed, and outperforms key baselines in visual reward learning and continual learning, even surpassing an idealized perfect memory agent.

We present ProgAgent, a continual reinforcement learning (CRL) agent that unifies progress-aware reward learning with a high-throughput, JAX-native system architecture. Lifelong robotic learning grapples with catastrophic forgetting and the high cost of reward specification. ProgAgent tackles these by deriving dense, shaped rewards from unlabeled expert videos through a perceptual model that estimates task progress across initial, current, and goal observations. We theoretically interpret this as a learned state-potential function, delivering robust guidance in line with expert behaviors. To maintain stability amid online exploration - where novel, out-of-distribution states arise - we incorporate an adversarial push-back refinement that regularizes the reward model, curbing overconfident predictions on non-expert trajectories and countering distribution shift. By embedding this reward mechanism into a JIT-compiled loop, ProgAgent supports massively parallel rollouts and fully differentiable updates, rendering a sophisticated unified objective feasible: it merges PPO with coreset replay and synaptic intelligence for an enhanced stability-plasticity balance. Evaluations on ContinualBench and Meta-World benchmarks highlight ProgAgent's advantages: it markedly reduces forgetting, boosts learning speed, and outperforms key baselines in visual reward learning (e.g., Rank2Reward, TCN) and continual learning (e.g., Coreset, SI) - surpassing even an idealized perfect memory agent. Real-robot trials further validate its ability to acquire complex manipulation skills from noisy, few-shot human demonstrations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes