Goal-Conditioned Reinforcement Learning for Data-Driven Maritime Navigation
This addresses maritime navigation challenges for shipping and logistics, but it is incremental as it builds on existing reinforcement learning methods with specific adaptations.
The paper tackles the problem of routing vessels through narrow and dynamic waterways by proposing a reinforcement learning solution that learns to find routes across multiple origin-destination pairs, balancing fuel efficiency, travel time, wind resistance, and route diversity, with experiments showing that action masking and positive shaping rewards improve policy performance.
Routing vessels through narrow and dynamic waterways is challenging due to changing environmental conditions and operational constraints. Existing vessel-routing studies typically fail to generalize across multiple origin-destination pairs and do not exploit large-scale, data-driven traffic graphs. In this paper, we propose a reinforcement learning solution for big maritime data that can learn to find a route across multiple origin-destination pairs while adapting to different hexagonal grid resolutions. Agents learn to select direction and speed under continuous observations in a multi-discrete action space. A reward function balances fuel efficiency, travel time, wind resistance, and route diversity, using an Automatic Identification System (AIS)-derived traffic graph with ERA5 wind fields. The approach is demonstrated in the Gulf of St. Lawrence, one of the largest estuaries in the world. We evaluate configurations that combine Proximal Policy Optimization with recurrent networks, invalid-action masking, and exploration strategies. Our experiments demonstrate that action masking yields a clear improvement in policy performance and that supplementing penalty-only feedback with positive shaping rewards produces additional gains.