Using a data lake stack in Animal Sciences
In the livestock domain, Big Data is becoming more common and is being anchored into the mind-set of researchers. With the increasing availability of large amounts of data of varying nature, there is the challenge how to store, combine, and analyze these data efficiently. With this study, we explored the possibility of using a data lake for storing and analyzing sensor data, using an animal experiment as use case, to improve scalability and interoperability. The use case was an experiment within Breed4Food (a public-private partnership), in which the gait score of 200 turkeys was determined. In the experiment, a gait score was traditionally assigned to each animal by a highly-skilled person who visually inspected them walking. Next to it, a set of sensor data streams was recorded for each animal, specifically inertial measurement units (IMUs), a 3D-video camera, and a force plate, with the ambition to explore the effectiveness of these data streams as predictors for estimating the gait score. The resulting sensor output, i.e. raw data, were successfully stored in its original format in the data lake. Subsequently, for each sensor output we performed extract, transform, and load activities, by executing custom-made scripts to generate tab or comma separated files. Lastly, by using Apache Spark it was possible to easily perform parallel processing of the data, allowing for fast computing. In conclusion, we managed to set up a data lake, load animal experimental data and run preliminary analyses. The data lake allowed for easy scale up of both data loading and analyses, which is desired for dynamic analyses pipelines, especially when more data are collected in the future.
D. Schokker, I.N. Athanasiadis, B. Visser, R.F. Veerkamp, C. Kamphuis, Using a data lake stack in Animal Sciences, Proceedings of the 9th European Conference on Precision Livestock Farming, pg. 140-144, 2019.
You might also enjoy (View all publications)
- Mixing process-based and data-driven approaches in yield prediction
- Combining telecom data with heterogeneous data sources for traffic and emission assessments - an agent-based approach
- A weakly supervised framework for high-resolution crop yield forecasts