Predicting crop yields with little ground truth: A simple statistical model for in-season forecasting
This provides a practical tool for agricultural forecasting in regions with scarce data, though it is incremental as it builds on existing satellite-based methods.
The paper tackles the problem of in-season crop yield prediction with limited ground truth data by using satellite data and a simple regression model, achieving RMSEs of 5%-10% for 9-month forecasts and 7%-14% for 3-month forecasts across 10 crop-country pairs.
We present a fully automated model for in-season crop yield prediction, designed to work where there is a dearth of sub-national "ground truth" information. Our approach relies primarily on satellite data and is characterized by careful feature engineering combined with a simple regression model. As such, it can work almost anywhere in the world. Applying it to 10 different crop-country pairs (5 cereals -- corn, wheat, sorghum, barley and millet, in 2 countries -- Ethiopia and Kenya), we achieve RMSEs of 5%-10% for predictions 9 months into the year, and 7%-14% for predictions 3 months into the year. The model outputs daily forecasts for the final yield of the current year. It is trained using approximately 4 million data points for each crop-country pair. These consist of: historical country-level annual yields, crop calendars, crop cover, NDVI, temperature, rainfall, and evapotransporation.