CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting
Abstract
In-season, pre-harvest crop yield forecasts are essential for enhancing transparency in commodity markets and improving food security. They play a key role in increasing resilience to climate change and extreme events and thus contribute to the United Nations’ Sustainable Development Goal 2 of zero hunger. Pre-harvest crop yield forecasting is a complex task, as several interacting factors contribute to yield formation, including in-season weather variability, extreme events, long-term climate change, soil, pests, diseases and farm management decisions. Several modeling approaches have been employed to capture complex interactions among such predictors and crop yields. Prior research for in-season, pre-harvest crop yield forecasting has primarily been case-study based, which makes it difficult to compare modeling approaches and measure progress systematically. To address this gap, we introduce CY-Bench (Crop Yield Benchmark), a comprehensive dataset and benchmark to forecast maize and wheat yields at a global scale. CY-Bench was conceptualized and developed within the Machine Learning team of the Agricultural Model Intercomparison and Improvement Project (AgML) in collaboration with agronomists, climate scientists, and machine learning researchers. It features publicly available sub-national yield statistics and relevant predictors—such as weather data, soil characteristics, and remote sensing indicators—that have been pre-processed, standardized, and harmonized across spatio-temporal scales. With CY-Bench, we aim to: (i) establish a standardized framework for developing and evaluating data-driven models across diverse farming systems in more than 25 countries across six continents; (ii) enable robust and reproducible model comparisons that address real-world operational challenges; (iii) provide an openly accessible dataset to the earth system science and machine learning communities, facilitating research on time series forecasting, domain adaptation, and online learning. The dataset and accompanying code are openly available to support the continuous development of advanced data driven models for crop yield forecasting to enhance decision-making on food security.
Published as:
D. Paudel,
M. Kallenberg,
S. Ofori-Ampofo,
H. Baja,
R. van Bree,
A. Potze,
P. Poudel,
A. Saleh,
W. Anderson,
M. von Bloh,
A. Castellano,
O. Ennaji,
R. Hamed,
R. Laudien,
D. Lee,
I. Luna,
M. Meroni,
J. M. Mutuku,
S. Mkuhlani,
J. Richetti,
A. C. Ruane,
R. Sahajpal,
G. Shai,
V. Sitokonstantinou,
R. de Souza Nóia Júnior,
A. K. Srivastava,
R. Strong,
L.-B. Sweet,
P. Vojnovic,
I. N. Athanasiadis,
CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting,
2025.
You might also enjoy (View all publications)
- CY-Bench: A comprehensive benchmark dataset for sub-national crop yield forecasting
- Fully automatic extraction of morphological traits from the Web: utopia or reality?
- Transdisciplinary coordination is essential for advancing agricultural modeling with machine learning