[Defended thesis] Gaëtan Heidsieck

[Defended thesis] Gaëtan Heidsieck: Distributed Management of Scientific Workflows for High-Throughput Plant Phenotyping

Gaëtan defended his PhD on 9 December @Lirmm, Montpellier.

Distributed Management of Scientific Workflows for High-Throughput Plant Phenotyping

 

My name is Gaëtan Heidsieck and I'm working on my PhD at Inria's Zenith team in Montpellier, within Lirmm. I trained as a computer engineer, specializing in statistical learning, at the Ecole des Mines de Saint-Étienne. During my training, I found data management and distributed computing very interesting. So I decided to deepen this knowledge by applying it to a currently key field.

  • Starting date: November 2017
  • University: Université de Montpellier / MUSE Université d’Excellence
  • PhD schoolI2S , Montpellier
  • Scientific field: IT
  • Thesis management: Esther Pacitti (Université de Montpellier, équipe Zenith LIRMM), François Tardieu (INRA, UMR LEPSE)
  • Thesis supervissors: Christophe Pradal (Cirad, UMR AGAP)
  • Funding: #DigitAg – Inria
  • #DigitAg : Cofunded PhD – Axe 4 – Challenge 2

Keywords: Scientific workflow, distributed computing, cloud/ grid computing, phenotyping, reproducibility

Abstract: High-throughput plant phenotyping aims at capturing the genetic variability of plant responses to environmental factor for thousands of plants, in order to identify heritable traits for genomic selection and predict the genetic values of allelic combinations in different environments. This implies the automation of the measurement of a large number of traits (to characterize plant growth, plant development and plant functioning), which can now be performed using phenotyping platforms (such as the seven facilities of the French plant phenotyping network Phenome). These platforms produce massive data sets (plant imaging, climatic information, from 150 to 200 Terabytes of data per year in Phénome) that must be analyzed to understand interactions between genes and environment (GxE) and possibly predict crop performance. Thus, it has become critical to couple the design of flexible, adaptable, and dynamic evolving plant response models with these massive data sets. The challenge lies in efficiently analyzing huge and complex datasets while keeping such in-silico experiments reproducible. To deal with massive data sets, we need to exploit powerful computing environments in a way that is easy for the users, without requiring them high-technical knowledge. The goal of this thesis is to address two critical issues in the management of plant phenotyping experiments: (i) schedule distributed computation while considering many constraints and (ii) allow reuse and reproducibility of experiments by biologists. These methodological developments will be applied to scientific workflows for 3D shoot architecture reconstruction, implemented in the OpenAlea platform, and used in the Phénome project.

Contact :  Esther.Pacitti [AT] lirmm.fr

Social network: LinkedIn

Download the thesis manuscript

Papers in international journals

Christophe Pradal (Cirad), Sarah Cohen-Boulakia (Univ. Paris-Saclay), Gaetan Heidsieck (Inria), Esther Pacitti (Univ. Montpellier), François Tardieu (INRA) and Patrick Valduriez (Inria) (2018) . Distributed Management of Scientific Workflows for High-Throughput Plant Phenotyping. ERCIM News 113, April 2018 , Special theme: Smart Farming (short article) – https://hal.inria.fr/hal-01948568

G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez, Execution of Scientific Workflows in the Cloud Through Adaptive Caching. Transactions on Large-Scale Data-and Knowledge-Centered Systems (pp. 41-66), 2020

Heidsieck G., de Oliveira D., Pacitti E., Pradal C., Tardieu F., Valduriez P. (2020) Efficient Execution of Scientific Workflows in the Cloud Through Adaptive Caching, Lecture Notes in Computer Science

Papers in international conferences

G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez, Distributed Caching of Scientific Workflows in Multisite Cloud, DEXA 2020 : International Conference on Database and Expert Systems Applications (pp. 51-65) – https://dx.doi.org/10.1007/978-3-030-59051-2_4

G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez., Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud. DEXA 2019 : International Conference on Database and Expert Systems Applications (pp. 452-466) – https://agritrop.cirad.fr/593357/

G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez. Cache-aware scheduling of scientific workflows in multisite cloud: Gestion de Données – Principes, Technologies et Applications (under revision)

G. Heidsieck, D. de Oliveira, E. Pacitti, C. Pradal, F. Tardieu, P. Valduriez, Efficient Execution of Scientific Workflows in the Cloud Through Adaptive Caching. BDA 2019 : Gestion de Données – Principes, Technologies et Applications (pp. 41-66)- https://www.springerprofessional.de/en/efficient-execution-of-scientific-workflows-in-the-cloud-through/18363540