Ioannis Athanasiadis bio photo

Ioannis Athanasiadis

Professor and Chair of Artificial Intelligence
Wageningen University & Research

University Page LinkedIn Google Scholar ORCID ACM DL DBLP Web of Science Twitter Email

Efficient and scalable crop growth simulations using standard big data and distributed computing technologies

R. Knapen, A. de Wit, E. Buyukkaya, P. Petrou, D. Paudel, S. Janssen, I. Athanasiadis

Abstract

The digitization in agriculture has led to an explosion of highly detailed data generated, offering opportunities for further optimizing resource use in food production systems. However, managing and processing these growing data volumes presents significant challenges. This study investigates the suitability of standard big data and distributed computing technologies with a crop yield forecasting case study, and benchmarks performance and scalability of storage and compute. To that end a prototype system leveraging the Apache Spark big data analytics framework and using the WISS-WOFOST crop growth simulation model is assembled and evaluated for its efficiency and scalability when running large numbers of simulations using distributed computing on commonly available infrastructure. Existing data for maize and winter wheat, as typical summer and winter crops, is prepared for distributed storage and processing and used to measure the performance of the system on clusters of increasing sizes, from small Kubernetes Cloud deployments to large HPC configurations. Specific attention is paid to the aggregation of the grid-based simulation results to larger administrative regions for follow-up analysis and reporting. Our results demonstrate that the selected standard big data and distributed computing technology simplifies the application of distributed processing and storage, making the related trade-off between runtime and costs more attainable. By increasing the distribution of our system 64 times and the total number of cores used 45 times compared to the baseline, we obtained a 99% reduction in simulation processing time and a 95% decrease in the aggregation time of the simulation results, making detailed forecasting for large areas more tractable. However, distributed implementations remain inherently more complex than conventional ones. As such, the construction and use of distributed systems will continue to be a challenge for agricultural agronomists and agricultural data scientists.

Download full text in pdf format

cover image Published as:
R. Knapen, A. de Wit, E. Buyukkaya, P. Petrou, D. Paudel, S. Janssen, I. Athanasiadis, Efficient and scalable crop growth simulations using standard big data and distributed computing technologies, Computers and Electronics in Agriculture, 236:110392, 2025, Elsevier BV, doi:10.1016/j.compag.2025.110392.


You might also enjoy (View all publications)