AIAug 7, 2024

PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learning

arXiv:2408.04054v28 citationsh-index: 30
AI Analysis

This work addresses the problem of sample inefficiency and distribution shift in robotic reinforcement learning, offering an incremental improvement by integrating existing techniques.

The paper tackles the challenge of applying reinforcement learning to real-world robotic tasks by introducing PLANRL, a framework that combines motion planning and imitation learning to improve exploration and generalization, resulting in 10-15% higher training success rates in simulations and 30-40% higher success rates in real-world tasks compared to baselines.

Reinforcement Learning (RL) has shown remarkable progress in simulation environments, yet its application to real-world robotic tasks remains limited due to challenges in exploration and generalization. To address these issues, we introduce PLANRL, a framework that chooses when the robot should use classical motion planning and when it should learn a policy. To further improve the efficiency in exploration, we use imitation data to bootstrap the exploration. PLANRL dynamically switches between two modes of operation: reaching a waypoint using classical techniques when away from the objects and reinforcement learning for fine-grained manipulation control when about to interact with objects. PLANRL architecture is composed of ModeNet for mode classification, NavNet for waypoint prediction, and InteractNet for precise manipulation. By combining the strengths of RL and Imitation Learning (IL), PLANRL improves sample efficiency and mitigates distribution shift, ensuring robust task execution. We evaluate our approach across multiple challenging simulation environments and real-world tasks, demonstrating superior performance in terms of adaptability, efficiency, and generalization compared to existing methods. In simulations, PLANRL surpasses baseline methods by 10-15\% in training success rates at 30k samples and by 30-40\% during evaluation phases. In real-world scenarios, it demonstrates a 30-40\% higher success rate on simpler tasks compared to baselines and uniquely succeeds in complex, two-stage manipulation tasks. Datasets and supplementary materials can be found on our {https://raaslab.org/projects/NAVINACT/}.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes