[Defended thesis] Luis-Felipe Vargas-Rojas

[Defended thesis] Luis-Felipe Vargas-Rojas: Semantic Representation and Computation of Mathematical Formulas for Plant Phenomics Data Exploitation

Felipe defended his PhD on 15th December 2023 at 9 AM @Institut Agro Montpellier (amphithéâtre Amphi 206, Bât. 9 - Coeur d'école niv. 2).

Semantic Representation and Computation of Mathematical Formulas for Plant Phenomics Data Exploitation

 

Doctorant-Luis-Felipe-Vargas-Rojas.png
Felipe Vargas-Rojas © #DigitAg

 

Hello, my name is Felipe Vargas and I'm from Colombia. I trained as a computer engineer at the Universidad del Valle de Cali-Colombia. I specialized in data science and Big Data applied to agricultural issues. I worked for 3 years at CIAT (International Center for Tropical Agriculture) for the HarvestPlus program. For the 6 months prior to my thesis, I worked as a research engineer at INRAE, at LEPSE.
My thesis is at the interface between 2 INRAE laboratories: UMR LEPSE, a pioneer in phenotyping approaches, and UMR MISTEA, a specialist in data management.
My work focuses on enriching Semantic Web tools for organizing plant phenomena datasets, with the aim of establishing an ontology-based mechanism for expressing numerical equations and rules as relationships between characters at multiple levels (field and greenhouse). In this way, software agents will be able to retrieve automatically generated data by inference from formalized knowledge. Theoretical knowledge about the dynamics of plane phenomena was used manually to map information at different scales, for example from plant level to canopy level. By including this information in the data modeling, researchers and practitioners will be able to automatically obtain and combine additional data to explore new hypotheses and gain insights at multiple levels.
I've already been able to take on challenges in this field during 2 previous experiences:

- during my master's internship, where I implemented a complete end-to-end Big Data architecture to run machine learning models on agronomic data. Some processes such as data preparation and data cleansing included pipelines to harmonize data that had a different level of detail.
- I created an ontology to model ICASA's agronomic vocabulary. I think ontologies offer a robust set of data storage tools coupled with sophisticated metadata and external relationships. In addition, there are reasons to create new data derived from existing records. However, they only work with logical rules that neglect other types of knowledge sources such as equations, statistical models, machine learning and pipelines. By formalizing this knowledge in terms of ontologies, we could automate several processes offering a more complete framework for the community.

  • Starting date: December 2019
  • University: MUSE – L'Institut Agro
  • PhD school:  GAIA (Biodiversité, Agriculture, Alimentation, Environnement, Terre, Eau) – ED 584
  • Scientific field: IT, data sciences
  • Thesis management: Pierre Martre, LEPSE, Inrae
  • Thesis supervisors: Llorenç Cabrera-Bosque, LEPSE, Inrae & Danai Symeonidou, UMR Mistea, Inrae
  • Funding: #DigitAg – Inrae
  • #DigitAg : Cofunded thesis – Axes 4, 6 – Challenge 2

Keywords: Ontologies, plant phenomena, modeling

Abstract:  Knowledge Graphs (KGs) have become central to managing diverse datasets across fields like agriculture, biomedical, environmental, and social sciences. Semantic Web (SW) technologies excel at representing taxonomic knowledge in these KGs. However, scenarios involving numerical relationships with algebraic operations or unit conversions are less addressed, yet hold potential to enhance KG data. For instance, consider the Body Mass Index (BMI) computation, which relies on weight and height properties. This can enrich a KG with derived data. Similarly, deriving the Vapour Pressure Deficit (VPD) from air temperature and relative humidity is valuable. While experts understand these formulas, they are often implemented in ad-hoc programming languages, limiting reuse and reproducibility. This thesis explores Semantic Web approaches to represent and compute these numerical relationships. We identify current limitations in representation, computational methods, and expressivity. To tackle these challenges, we propose a Semantic Web-based framework with the following goals: (i) Represent mathematical formulas in line with Linked Open Data (LOD) and FAIR principles, to enhance adoption and reproducibility. (ii) Enable on-demand execution of numerical relationships, recognising that materialising results is infeasible for large and diverse KGs. (iii) Express mathematical formulas using KG data in the form of quantity values, leveraging semantic resources and metadata like unit ontologies. (iv) Facilitate aggregations within mathematical formulas, acknowledging that much of this numerical data operates on multiple scales. We evaluate this framework on KGs from the agriculture and plant phenomics domain, the focus of this thesis, as well as on more established Semantic Web KGs like DBpedia.

Jury compound:

Pierre MARTRE, Directeur de Recherche, INRAE, Directeur de thèse
Philippe VISMARA, Professeur, Institut Agro Montpellier, Président
Fatiha SAÏS, Professeur, Université Paris Saclay, Rapporteur
François PINET, Directeur de Recherche, INRAE, Rapporteur
Isabelle MOUGENOT, Professeur, Université de Montpellier, Examinatrice
Christophe PRADAL, Cadre Scientifique, CIRAD, Invité
Danai SYMEONIDOU, Chargée de recherche, INRAE, Co-Encadrante
Llorenç CABRERA-BOSQUET, Ingénieur de recherche, INRAE, Co-Encadrant 
 

Contact :  luis-felipe.vargas-rojas [AT] inrae.fr​ 

Social network: LinkedIn

Papers and Communications
International Journals

  • Felipe Vargas-Rojas, Llorenç Cabrera-Bosquet, Danai Symeonidou (2023), QAVAN: Query-answering approach for actionable numerical relationships over Knowledge Graphs, Knowledge-Based Systems Journal, JCR: Q1, I.F.: 8.8

International Conferences and Workshops

  • Vargas-Rojas, Felipe, Axel Polleres, Llorenç Cabrera-Bosquet, and Danai Symeonidou (2023), ”PhyQus: Automatic Unit Conversions for Wikidata Physical Quantities”, In 4rd Wikidata Workshop (co-located with ISWC2023)
  • Vargas-Rojas, Felipe (2021), ”Ontological formalisation of mathematical equations for phenomic data exploitation.” In European Semantic Web Conference, pp. 176-185, Cham: Springer International Publishing