Retrieval-Augmented Multi-scale Framework for County-Level Crop Yield Prediction Across Large Regions
This research provides more reliable crop yield predictions for policymakers and resource allocators by improving generalization across diverse spatial regions and time periods, which is an incremental improvement over existing data-driven approaches.
This paper introduces a new framework for county-level crop yield prediction that addresses the challenges of capturing both short-term and long-term temporal patterns and accommodating spatial data variability across large geographic regions and long time periods. The proposed method consistently outperforms various baselines on real-world corn yield data across 630 US counties.
This paper proposes a new method for crop yield prediction, which is essential for developing management strategies, informing insurance assessments, and ensuring long-term food security. Although existing data-driven approaches have shown promise in this domain, their performance often degrades when applied across large geographic regions and long time periods. This limitation arises from two key challenges: (1) difficulty in jointly capturing short-term and long-term temporal patterns, and (2) inability to effectively accommodate spatial data variability in agricultural systems. Ignoring these issues often leads to unreliable predictions for specific regions or years, which ultimately affects policy decisions and resource allocation. In this paper, we propose a new predictive framework to address these challenges. First, we introduce a new backbone model architecture that captures both short-term daily-scale crop growth dynamics and long-term dependencies across years. To further improve generalization across diverse spatial regions, we augment this model with a retrieval-based adaptation strategy. Recognizing the substantial yield variation across years, we design a novel retrieval-and-refinement pipeline that adjusts retrieved samples by removing cross-year bias not explained by input features. Our experiments on real-world county-level corn yield data over 630 counties in the US demonstrate that our method consistently outperforms different types of baselines. The results also verify the effectiveness of the retrieval-based augmentation method in improving model robustness under spatial heterogeneity.