ROAILGNov 28, 2023

Safe Reinforcement Learning in a Simulated Robotic Arm

arXiv:2312.09468v21 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This work is incremental, extending safe RL methods to a specific robotic arm simulation for tasks like human-robot interaction.

The paper tackled the problem of applying safe reinforcement learning to a robotic arm by creating a customized environment based on the Panda arm for testing Safety Gym algorithms, and found that a constrained version of PPO learned equally good policies with better safety compliance, though with longer training times.

Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies. In many environments and tasks, safety is of critical importance. The widespread use of simulators offers a number of advantages, including safe exploration which will be inevitable in cases when RL systems need to be trained directly in the physical environment (e.g. in human-robot interaction). The popular Safety Gym library offers three mobile agent types that can learn goal-directed tasks while considering various safety constraints. In this paper, we extend the applicability of safe RL algorithms by creating a customized environment with Panda robotic arm where Safety Gym algorithms can be tested. We performed pilot experiments with the popular PPO algorithm comparing the baseline with the constrained version and show that the constrained version is able to learn the equally good policy while better complying with safety constraints and taking longer training time as expected.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes