CVAIMar 15

AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models

arXiv:2603.1434266.1h-index: 21
AI Analysis

This addresses robust spatial understanding for agricultural planning across varying scales, though it is domain-specific and incremental in method.

The paper tackles the problem of terrestrial-centric bias in multimodal large language models for agriculture, which causes scale confusion and logic drift, by introducing AgroNVILA with a Perception-Reasoning Decoupling architecture, achieving a 15.18% improvement in multi-altitude agricultural reasoning.

Agricultural multimodal reasoning requires robust spatial understanding across varying scales, from ground-level close-ups to top-down UAV and satellite imagery. Existing Multi-modal Large Language Models (MLLMs) suffer from a significant "terrestrial-centric" bias, causing scale confusion and logic drift during complex agricultural planning. To address this, we introduce the first large-scale AgroOmni (288K), a multi-view training corpus designed to capture diverse spatial topologies and scales in modern precision agriculture. Built on this dataset, we propose AgroNVILA, an MLLM that utilizes a novel Perception-Reasoning Decoupling (PRD) architecture. On the perception side, we incorporate a View-Conditioned Meta-Net (VCMN), which injects macroscopic spatial context into visual tokens, resolving scale ambiguities with minimal computational overhead. On the reasoning side, Agriculture-aware Relative Policy Optimization (ARPO) leverages reinforcement learning to align the model's decision-making with expert agricultural logic, preventing statistical shortcuts. Extensive experiments demonstrate that AgroNVILA outperforms state-of-the-art MLLMs, achieving significant improvements (+15.18%) in multi-altitude agricultural reasoning, reflecting its robust capability for holistic agricultural spatial planning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes