RO AI SYSep 16, 2024

Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning

Daniel Flögel, Marcos Gómez Villafañe, Joshua Ransiek, Sören Hohmann

arXiv:2409.10655v35.73 citationsh-index: 3

Originality Incremental advance

AI Analysis

This addresses safe navigation for autonomous robots in pedestrian-rich environments, representing an incremental improvement with novel uncertainty integration.

The paper tackles the problem of uncertainty estimation in deep reinforcement learning for social robot navigation, introducing a method that integrates aleatoric, epistemic, and predictive uncertainty into PPO with ODV and dropout, resulting in improved training performance and fewer collisions in perturbed environments.

Autonomous mobile robots are increasingly used in pedestrian-rich environments where safe navigation and appropriate human interaction are crucial. While Deep Reinforcement Learning (DRL) enables socially integrated robot behavior, challenges persist in novel or perturbed scenarios to indicate when and why the policy is uncertain. Unknown uncertainty in decision-making can lead to collisions or human discomfort and is one reason why safe and risk-aware navigation is still an open problem. This work introduces a novel approach that integrates aleatoric, epistemic, and predictive uncertainty estimation into a DRL navigation framework for policy distribution uncertainty estimates. We, therefore, incorporate Observation-Dependent Variance (ODV) and dropout into the Proximal Policy Optimization (PPO) algorithm. For different types of perturbations, we compare the ability of deep ensembles and Monte-Carlo dropout (MC-dropout) to estimate the uncertainties of the policy. In uncertain decision-making situations, we propose to change the robot's social behavior to conservative collision avoidance. The results show improved training performance with ODV and dropout in PPO and reveal that the training scenario has an impact on the generalization. In addition, MC-dropout is more sensitive to perturbations and correlates the uncertainty type to the perturbation better. With the safe action selection, the robot can navigate in perturbed environments with fewer collisions.

View on arXiv PDF

Similar