LG AIOct 30, 2025

Reinforcement Learning for Pollution Detection in a Randomized, Sparse and Nonstationary Environment with an Autonomous Underwater Vehicle

Sebastian Zieglmeier, Niklas Erdmann, Narada D. Warakagoda

arXiv:2510.26347v1h-index: 8

Originality Incremental advance

AI Analysis

This work addresses the problem of adapting RL for complex, sparse environments in underwater robotics, but it is incremental as it builds on existing methods with modifications.

The paper tackled the challenge of applying reinforcement learning in random, nonstationary, and reward-sparse environments, such as pollution detection with autonomous underwater vehicles, by modifying classical RL approaches, and found that a modified Monte Carlo-based method significantly outperformed traditional Q-learning and exhaustive search patterns.

Reinforcement learning (RL) algorithms are designed to optimize problem-solving by learning actions that maximize rewards, a task that becomes particularly challenging in random and nonstationary environments. Even advanced RL algorithms are often limited in their ability to solve problems in these conditions. In applications such as searching for underwater pollution clouds with autonomous underwater vehicles (AUVs), RL algorithms must navigate reward-sparse environments, where actions frequently result in a zero reward. This paper aims to address these challenges by revisiting and modifying classical RL approaches to efficiently operate in sparse, randomized, and nonstationary environments. We systematically study a large number of modifications, including hierarchical algorithm changes, multigoal learning, and the integration of a location memory as an external output filter to prevent state revisits. Our results demonstrate that a modified Monte Carlo-based approach significantly outperforms traditional Q-learning and two exhaustive search patterns, illustrating its potential in adapting RL to complex environments. These findings suggest that reinforcement learning approaches can be effectively adapted for use in random, nonstationary, and reward-sparse environments.

View on arXiv PDF

Similar