LG ROJun 13, 2023

Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance

arXiv:2306.08008v12.03 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses a domain-specific problem in robotics and pathfinding by improving obstacle avoidance in reinforcement learning, though it appears incremental as it builds on existing methods like ConstraintNet.

The paper tackled the problem of handling dynamic interval restrictions on action spaces in deep reinforcement learning for obstacle avoidance, proposing two approaches that extend parameterized RL and ConstraintNet to manage arbitrary intervals, and found that discrete masking is effective when constraints are not learned during training, while other methods depend on task specifics.

Deep reinforcement learning algorithms typically act on the same set of actions. However, this is not sufficient for a wide range of real-world applications where different subsets are available at each step. In this thesis, we consider the problem of interval restrictions as they occur in pathfinding with dynamic obstacles. When actions that lead to collisions are avoided, the continuous action space is split into variable parts. Recent research learns with strong assumptions on the number of intervals, is limited to convex subsets, and the available actions are learned from the observations. Therefore, we propose two approaches that are independent of the state of the environment by extending parameterized reinforcement learning and ConstraintNet to handle an arbitrary number of intervals. We demonstrate their performance in an obstacle avoidance task and compare the methods to penalties, projection, replacement, as well as discrete and continuous masking from the literature. The results suggest that discrete masking of action-values is the only effective method when constraints did not emerge during training. When restrictions are learned, the decision between projection, masking, and our ConstraintNet modification seems to depend on the task at hand. We compare the results with varying complexity and give directions for future work.

View on arXiv PDF

Similar