[Defended thesis] Maxime Metz

[Defended thesis] Maxime Metz: Application of big data methods for the improvement of local PLS algorithms in chemometrics

Maxime defended his PhD on 26 November 2021 @Amphithéâtre « Louis Malassis » - Agropolis International.

Application of big data methods for the improvement of local PLS algorithms in chemometrics

 

Inrae PhD student, hosted at UMR ITAP (Information-Technologies-Environmental Analysis-Agricultural Processes) in Montpellier.My career path is as linear as a quadratic model First of all, I did a D.U.T. and then a vocational degree in chemistry, on a sandwich course in Saint-Avold (Lorraine). From this experience, I concluded that the technician's job was not suited to my personality, but I had already acquired a taste for the approaches used in chemometrics, such as experimental design.I continued my studies with a Master 1 in Chemistry in Nice, then naturally headed for Brest to join the Master 2 in Chemistry specialized in OPEX (Analytical Chemistry, Chemometrics, Quality - Optimization of Experimental Processes), once again on a sandwich course, at Continental in Lorraine.I still remember what I said at the beginning of the year: "a thesis, not for me". Thanks to this Master's degree, I was able to learn about chemometrics, i.e. the mathematical processing of chemical data, and apply the concepts I'd learned in class directly to industrial problems. It was at this point that I decided to do a thesis and understood the importance of developing chemometric methods to deal with big data.My thesis addresses a major problem: how can I rapidly produce local regression and classification models with large quantities of data? For example, how can we quickly predict the sugar content of an apple, or the protein content of a food product, using millions of reference data sets?

  • Starting date: October 2018
  • University: MUSE Montpellier Université d’Excellence – Institut Agro
  • PhD school: GAIA
  • Scientific field: Chimiometrics
  • Thesis management: Jean-Michel Roger, Itap, Inrae et Matthieu Lesnoff, Selmet, Cirad
  • Thesis supervisors: Nathalie Gorretta, Itap, Inrae, M. Lesnoff, Selmet, Cirad, Florent Masseglia, Inria Zenith
  • Funding: #DigitAg – Inrae
  • #DigitAg : Cofunded PhD – Axe 5  – Challenges : sujet transverse

Keywords: Partial least squares regression (PLSR), local methods, indexing, infrared spectroscopy, chemometrics, large databases (Big data)

Abstract: Near infrared spectrometry can provide huge amounts of data to digital agriculture. The main tool of chemometrics, used to analyze NIR spectra, is Partial Least Squares (PLS) regression. PLS allows building efficient predictive models from a large number of variables even if these variables are highly correlated. The method has proved its relevance for small homogeneous databases. Its extension to medium-sized bases (<10,000 individuals) is the “local-PLS”: it determines a neighborhood of the individual to be predicted, and then realizes a usual PLS on this neighborhood. This method combines the power of the k nearest neighbors’ method (k-NN) and the PLS. However, it is is not able to process large databases (e.g. >50,000 individuals) or even >1 million of individuals that will appear in the near future to digital agriculture. The current local-PLS algorithms all use sequential k-NN algorithms for which calculation times become unrealistic; other algorithms must be considered. Paradoxically, very little research has been done on this challenge in chemometrics. Our idea is that algorithms of indexation used in big data, integrated in the local-PLS method, could lift this methodological lock. We propose to consider two algorithms of dimension reduction and fast neighborhood searches used by the Zenith Team of Lirmm-Montpellier for processing large data sets of time series (that have a similar data structure as the NIR spectra): the hashing (calculation of sketches) and the iSax (Symbolic Aggregate approXimation). The work will consist in two steps: (1) a “business as usual” integration of the two algorithms in the local-PLS algorithm, (2) an optimisation of the algorithms taking into account the chemometric specificity of the NIR spectra. The new algorithms developed in this thesis will improve the ability to predict physico-chemical variables from large heterogeneous NIRS data bases, and will find direct applications in many domains (plants, feed, soils, etc.).

Jury compound:

  • Gilbert Saporta, Professeur émérite, CNAM, France
  • Douglas Rutledge, Professeur émérite, AgroParisTech, France
  • Florent Masseglia, Directeur de recherche, INRIA,France
  • Fédérico Marini, Professeur, Université de Rome, Italie
  • Marina Cocchi, Professeure associée, Université de Modène et de Reggio d’Émilie, Italie
  • Jean-Michel Roger, Ingénieur de recherche, INRAE-ITAP, France
  • Matthieu Lesnoff, Chercheur, CIRAD, France
  • Reza Akbarinia, Chargé de Recherche, INRIA, France

Contact:  
nathalie.gorretta [AT] inrae.fr
jean-michel.roger [AT] inrae.fr

Social nertwork: LinkedIn

Communications & papers

Download the thesis manuscript

Maxime Metz, Florent Abdelghafour, Jean-Michel Roger, Matthieu Lesnoff (2021) A novel robust PLS regression method inspired from boosting principles: RoBoost-PLSR, Analytica Chimica Acta

Metz M.,Lesnoff M.,Abdelghafour F.,Akbarinia R.,Masseglia F.,Roger J.-M. (2020) A “big-data” algorithm for KNN-PLS, Chemometrics and Intelligent Laboratory Systems

Metz M., Biancolillo A., Lesnoff M., Roger J.-M. (2020) A note on spectral data simulation, Chemometrics and Intelligent Laboratory Systems

Maxime Metz, Jean-Michel Roger, Matthieu Lesnoff, Reza Akbarinia, Florent Masseglia (2019) Adaptation of two Big-data indexing algorithms for LOCAL-PLS.  Conférence Chiométrie 2019, Montpellier, 30.01-01-02 2019
– prix du meilleur poster

Modification date : 10 October 2023 | Publication date : 23 August 2022 | Redactor : ZM