Gaëtan Heidsieck is one of the #DigitAg co-funded PhDs
He defended his thesis on Wednesday 9th December at 10am, at the Lirmm of Montpellier.
Distributed Management of Scientific Workflows for High-Throughput Plant Phenotyping
- Start Date: November 2017
- University: University of Montpellier / MUSE Montpellier University of Excellence
- PhD School: I2S , Montpellier
- Field(s): Computer Science
- Doctoral Thesis Advisor: Esther Pacitti (Zenith team, University Montpellier and Inria, LIRMM), François Tardieu (UMR LEPSE, INRA)
- Co-supervisors : Christophe Pradal (UMR AGAP, CIRAD)
- Funding: #DigitAg – Inria
- #DigitAg: Co-funded PhD – Axis 4 – Challenge 2
Keywords: Scientific Workflow, Distributed Computing, Cloud & Grid Computing, Phenotyping, Computer Vision, Reproducibility
Abstract: High-throughput plant phenotyping aims at capturing the genetic variability of plant responses to environmental factor for thousands of plants, in order to identify heritable traits for genomic selection and predict the genetic values of allelic combinations in different environments. This implies the automation of the measurement of a large number of traits (to characterize plant growth, plant development and plant functioning), which can now be performed using phenotyping platforms (such as the seven facilities of the French plant phenotyping network Phenome). These platforms produce massive data sets (plant imaging, climatic information, from 150 to 200 Terabytes of data per year in Phénome) that must be analyzed to understand interactions between genes and environment (GxE) and possibly predict crop performance.
Thus, it has become critical to couple the design of flexible, adaptable, and dynamic evolving plant response models with these massive data sets. The challenge lies in efficiently analyzing huge and complex datasets while keeping such in-silico experiments reproducible. To deal with massive data sets, we need to exploit powerful computing environments in a way that is easy for the users, without requiring them high-technical knowledge. The goal of this thesis is to address two critical issues in the management of plant phenotyping experiments: (i) schedule distributed computation while considering many constraints and (ii) allow reuse and reproducibility of experiments by biologists. These methodological developments will be applied to scientific workflows for 3D shoot architecture reconstruction, implemented in the OpenAlea platform, and used in the Phénome project.
Contact: gaetan.heidsieck [AT] inria.fr – Tél. 04 67 14 97 27
Christophe Pradal (Cirad), Sarah Cohen-Boulakia (Univ. Paris-Saclay), Gaetan Heidsieck (Inria), Esther Pacitti (Univ. Montpellier), François Tardieu (INRA) and Patrick Valduriez (Inria) (2018) . Distributed Management of Scientific Workflows for High-Throughput Plant Phenotyping. ERCIM News 113, April 2018 , Special theme: Smart Farming (short article) – https://hal.inria.fr/hal-01948568
G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez, Execution of Scientific Workflows in the Cloud Through Adaptive Caching. Transactions on Large-Scale Data-and Knowledge-Centered Systems (pp. 41-66), 2020 –
Communications at international conferences
G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez, Distributed Caching of Scientific Workflows in Multisite Cloud, DEXA 2020 : International Conference on Database and Expert Systems Applications (pp. 51-65) – https://dx.doi.org/10.1007/978-3-030-59051-2_4
G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez., Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud. DEXA 2019 : International Conference on Database and Expert Systems Applications (pp. 452-466) – https://agritrop.cirad.fr/593357/
G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez. Cache-aware scheduling of scientific workflows in multisite cloud: Gestion de Données – Principes, Technologies et Applications (under revision)
G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez, Efficient Execution of Scientific Workflows in the Cloud Through Adaptive Caching. BDA 2019 : Gestion de Données – Principes, Technologies et Applications (pp. 41-66)- https://www.springerprofessional.de/en/efficient-execution-of-scientific-workflows-in-the-cloud-through/18363540