A template framework for environmental timeseries data acquisition
Environmental timeseries data variety is exploding in the Internet of Things era, making data reuse a very demanding task. Data acquisition and integration remains a laborious step of the environmental data lifecycle. Environmental data heterogeneity is a persistent issue, as data are becoming available through different protocols and stored under diverse, custom formats. In this work, we deal with syntactic heterogeneity in environmental timeseries data. Our approach is based on describing different dataset syntaxes using abstract representations, called templates. We designed and implemented EDAM (Environmental Data Acquisition Module), a template framework that facilitate timeseries data acquisition and integration. EDAM templates are written using programming language agnostic semantics, and can be reused both for input and output, thus enabling data reuse via transformations accross different formats. We demonstrate EDAM generality in seven case studies, which involve scraping online data, extracting observations from a relational database, or aggregating historical timeseries stored in local files. Case studies span different environmental sciences domains, including meteorology, agriculture, urban air quality and hydrology. We also demonstrate EDAM for data dissemination, as instructed by output templates. We identified several syntactic interoperability challenges though the cases studies, that include managing with differences in formatting observables, temporal and spatial references, and metadata documentation, and addressed them with EDAM. EDAM implementation has been released under an open-source license.
A. Samourkasidis, E. Papoutsoglou, I. N. Athanasiadis, A template framework for environmental timeseries data acquisition, Environmental Modelling and Software, 117:237-249, 2019, doi:10.1016/j.envsoft.2018.10.009.
You might also enjoy (View all publications)
- Mixing process-based and data-driven approaches in yield prediction
- Combining telecom data with heterogeneous data sources for traffic and emission assessments - an agent-based approach
- A weakly supervised framework for high-resolution crop yield forecasts