ROAIOct 16, 2025

SUM-AgriVLN: Spatial Understanding Memory for Agricultural Vision-and-Language Navigation

arXiv:2510.14357v11 citationsh-index: 2Has Code
Originality Incremental advance
AI Analysis

This addresses navigation inefficiencies for agricultural robots by enabling memory-based learning, though it is incremental as it builds on the existing AgriVLN method.

The paper tackles the problem of agricultural robots navigating based on natural language instructions by proposing a Spatial Understanding Memory module that leverages past experiences for spatial context, improving Success Rate from 0.47 to 0.54 on the A2A benchmark.

Agricultural robots are emerging as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily rely on manual operation or fixed rail systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extend Vision-and-Language Navigation (VLN) to the agricultural domain, enabling robots to navigate to the target positions following the natural language instructions. In practical agricultural scenarios, navigation instructions often repeatedly occur, yet AgriVLN treat each instruction as an independent episode, overlooking the potential of past experiences to provide spatial context for subsequent ones. To bridge this gap, we propose the method of Spatial Understanding Memory for Agricultural Vision-and-Language Navigation (SUM-AgriVLN), in which the SUM module employs spatial understanding and save spatial memory through 3D reconstruction and representation. When evaluated on the A2A benchmark, our SUM-AgriVLN effectively improves Success Rate from 0.47 to 0.54 with slight sacrifice on Navigation Error from 2.91m to 2.93m, demonstrating the state-of-the-art performance in the agricultural domain. Code: https://github.com/AlexTraveling/SUM-AgriVLN.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes