RO LG SYFeb 25, 2025

Toward 6-DOF Autonomous Underwater Vehicle Energy-Aware Position Control based on Deep Reinforcement Learning: Preliminary Results

Gustavo Boré, Vicente Sufán, Sebastián Rodríguez-Martínez, Giancarlo Troni

arXiv:2502.17742v12 citationsh-index: 22

Originality Incremental advance

AI Analysis

This addresses power efficiency and maneuverability challenges for AUVs in surveying and inspection tasks, though it is incremental as it extends existing DRL methods to more degrees of freedom.

The paper tackles the problem of controlling 6-degree-of-freedom autonomous underwater vehicles by proposing a deep reinforcement learning approach based on the Truncated Quantile Critics algorithm, which achieves better performance than a fine-tuned PID controller in simulations and reduces power consumption by 30% on average with an energy-aware variant.

The use of autonomous underwater vehicles (AUVs) for surveying, mapping, and inspecting unexplored underwater areas plays a crucial role, where maneuverability and power efficiency are key factors for extending the use of these platforms, making six degrees of freedom (6-DOF) holonomic platforms essential tools. Although Proportional-Integral-Derivative (PID) and Model Predictive Control controllers are widely used in these applications, they often require accurate system knowledge, struggle with repeatability when facing payload or configuration changes, and can be time-consuming to fine-tune. While more advanced methods based on Deep Reinforcement Learning (DRL) have been proposed, they are typically limited to operating in fewer degrees of freedom. This paper proposes a novel DRL-based approach for controlling holonomic 6-DOF AUVs using the Truncated Quantile Critics (TQC) algorithm, which does not require manual tuning and directly feeds commands to the thrusters without prior knowledge of their configuration. Furthermore, it incorporates power consumption directly into the reward function. Simulation results show that the TQC High-Performance method achieves better performance to a fine-tuned PID controller when reaching a goal point, while the TQC Energy-Aware method demonstrates slightly lower performance but consumes 30% less power on average.

View on arXiv PDF

Similar