Every year, #DigitAg grants Master 2 Internships for French and foreign students
In 2021, subjects from different disciplines are suggested in Human&Economic Sciences, Environment&Life sciences, as well as in Engineering and Mathematics.
- To apply please send your curriculum and application letter to the relevant contact from the list below
- Allowance: #DigitAg gives internships grants to the research units: the amount of your allowance is to be asked to the contact person
|Collecte et analyse de données issues des ruches connectées des apiculteurs
Beekeeping has faced many challenges in recent decades, the causes and consequences of which have plunged beekeepers and the economy of the sector into a situation that is still uncertain. In response to the decline of bees, the decline in French honey production and strong environmental constraints, beekeepers are increasingly expressing the need for technical support and to base their strategies on references and tools for decision aids.
One proposal is to study the behavior of honey bee colonies, their dynamics over a period and in a given territory and their performance in terms of honey production. For this, beekeepers have been using electronic and connected scales for the past fifteen years, measuring the weight of the hives in real time. With a few thousand scales now in operation with beekeepers and hourly information being reported, this constitutes a mass of data that is still undervalued to this day. In addition, the ability of a colony to exploit nectar resources is influenced by biotic (colony development status, floral resources) and abiotic (meteorology) factors which may be considered depending on the availability of these data. By crossing these different data sources and implementing data science methods, this offers great potential for building the prediction services of tomorrow.
A thesis led by ITSAP in collaboration with INRIA on this topic was accepted by DigitAg in 2018, but its implementation was thwarted by the onset of a financial crisis at ITSAP. The financial stability of the Institute is now secure, and the subject is still as important for the beekeeping sector, we wish to relaunch this scientific dynamic in 2021 through this internship subject. A new submission of a thesis subject as a continuation of this internship will also be made.
In addition, this internship will be based on the dynamics of the Occitanum Apiculture open lab which also provides for actions on these connected balance technologies to develop decision support tools for beekeepers.
The intern’s work will aim to
– Approach beekeepers in order to obtain their consent to collect digital data from the scales (a minimum of one hundred) and complete the contextual data essential to the interpretation of the evolution of the weight curves generated by the scales (i.e. geolocation, change and date of places during transhumance of beehives, activity / honey targeted by the beekeeper). This task will be carried out directly by the trainee or he will support advisers from beekeeping development associations in charge of ensuring this data collection.
– Validate the quality of the data collected according to the specifications of the format required for the “MIELLEES Computer System” database, already operational and developed by INRAE Genphyse and ITSAP-Institut de l’Abeille in 2019-2020.
– Apply a routine of “cleaning” of the numerical data of the evolution of the weight from an R programming acquired and tested on a set of experimental data restricted in 2020 (identification of the type of data, validation of the time step, management missing data, validation of the variation in weight, correction of weights, possibly segmentation).
– Identify subsets of consistent data from apidological criteria (honey production basin, period, target honey flow, density of scales identified per unit area, etc.).
– Develop and test the temporal dynamics of the weight of a hive (variable of interest) to predict the colony’s ability to exploit the nectar resources available within a given foraging radius and in relation to climatic factors. Different methods of machine learning will be tested.
– Offer operational, simple and educational outputs for beekeepers of the results obtained from coherent data subsets: visualization and mapping of productions, average of the evolution of weights, positioning of the activity of a colony in relation to other colonies, identification of favorable days for foraging according to the meteorology
|Phenotyping mixed crops by proxidetection sensors
Mixed crops have many agronomic advantages: better use of resources, better tolerance to stress. However, to be beneficial, they require adequate cultivation, adapted to local pedo-climatic conditions.
In order to better help producers choose the species and manage their crops, measurements are taken during the season in order to monitor the development of the main crop and associated cover. These data are classically obtained from visual field observations and destructive measurements. However, these methods are complex, tedious and often not very repeatable because very heterogeneous spatially. Therefore, their automation by sensors is a major challenge for accelerating acquisitions and improving their repeatability.
In 2021, a field trial comparing different modalities of associated crops will be set up in Oraison (04). It will be monitored by classical methods (visual, destructive) and by a set of largely automated phenotyping techniques (drones, portable system). Support activity with the technical team will be necessary to ensure the smooth running of the season. Once acquired and structured, this dataset will be used for the development of methods for characterizing the mixed covers as well as for their validation, in connection with experts from UMT CAPTE. Several approaches will be implemented and evaluated: quantification of coverage fractions by deep learning (semantic segmentation), characterization of the crops’ structure by depth image analysis (stereovision). Finally, a comparison of the performance of the different methods and tools will be established.
Integrated within the INRAe – ARVALIS team in Avignon, the intern oversees a complete data project: from data acquisition to analysis in the experiments set up. The main steps are as follows:
ARVALIS – Institut du végétal is a leading applied research organization in the analysis of sensor data for agriculture. Within the UMT CAPTE, in cooperation with INRAE of Avignon and the company HIPHEN, we develop tools and methods using sensors (lidars, cameras, spectrometers, etc.) integrated on different systems (robots, drones, sensors wireless, …)
|DeepBeesAlert: towards a system of sustainable management and protection of pollination resources
The honey bee, Apis mellifera, an essential link in biodiversity and the main pollinator of many food crops in Europe, is experiencing an unprecedented decline. Pesticides, parasites and climate change are contributing to this decline, as is the Asian Hornet, whose main prey is the honey bee. Accidentally introduced in France in 2004, this hornet is gradually invading Europe, ultimately threatening agriculture and our food security. While the research undertaken to control this predator is multiple and not very ecological, the hornet’s behavior in front of hives has never been studied in detail and its impact on bee populations has never been precisely quantified. However, only the automatic processing of behavioral data acquired en masse on a national scale will allow us to evaluate the economic cost of this predator, to know the types of apiaries that are most impacted and to warn about the moments or conditions favourable for intervention or trapping.
This project aims to develop a reliable and automatable methodology for counting predation events at the entrance of hives in order to quantify the hornet’s impact on apiaries and analyze its evolution over time and the factors likely to influence it. This will involve studying the behavior of hornets and counting from video surveillance data on the hives the catch/attack ratio of the bees over time in order to highlight possible correlations between the predation rate and the seasons, the hours of the day and the weather conditions. From 2016 to 2020, hives coupled with weather stations were equipped with cameras of different resolutions filming continuously one day per week between July and November (about 450 hours of acquisition).
The internship consists in setting up and validating automatic processing solutions coupling Deep Neuron Networks to detect and track objects in videos with variable acquisition frequency (from 25 to 240 fps) on fast moving and erratic data.
The student will have to carry out bibliographical research integrating recent significant advances in video processing (road traffic, bird flight, etc.), deploy and evaluate a convolutional neural network adapted to the specific detection of hornets and captures; object “tracking” scripts will complete the solution to correctly count the objects of interest in each video; finally, an analysis of the tracking trajectories of each object will be undertaken to address the behavioral study (visits, attacks and captures) of hornets. The student will have to evaluate the results produced by the selected solution and analyze the counting errors measured to improve the proposed system and thus enable to initiate large-scale statistical studies aimed at establishing correlations between predation rates and environmental factors.
|Design and development of a recommendation Web service of relevant semantic resources in agriculture
The master internship is part of the D2KAB project (www.d2kab.org), a research project started in 2019, which aims to help developers and users of agricultural data to transform their data into actionable knowledge. The D2KAB project develops and maintains AgroPortal, a public platform for semantic resources (thesaurus, terminologies, vocabularies, and ontologies), specialized in the agriculture and agronomy domain. AgroPortal is founded and maintained by the LIRMM laboratory in collaboration with the INRAE. It hosts around 126 semantic resources and offers its users a variety of semantic web services (search, annotation, alignment, etc.).
Currently, we are implementing a new Web service that makes AgroPortal semantic resources easily interoperable and reusable. More concretely, we adopt open science principle, mainly the FAIR principles (acronym of Findable, Accessible, Interoperable, and Reusable), via the exploitation of automatic techniques. In this context, we are offering an internship that aims to build and implement a recommendation system for semantic resources (described in OWL, OBO, and SKOS) within our platform. We hope that the recommendation system that will be developed by the future intern will support and enhance, on an international scale, AgroPortal’s FAIR vision.
The main objective of the master internship is to use artificial intelligence techniques for the realization of a recommendation Web service of semantic resources. More precisely, the main missions of the internship concern:
Completing the internship assignments will require motivation to learn semantic Web technologies, a good knowledge of existing artificial intelligence techniques, and technical background of at least one scripting language such as R or Python.
Profile of the desired candidate:
– Computer training, data science or equivalent
– Strong skills in Web development and in C or Java language
– Motivated, autonomous and rigorous
|Deep Learning for detection, delineation, and differentiation of mango trees and mango-based orchards from very high spatial resolution images
In West Africa, information acquisition in fruit sector is hampered by the lack of adapted methods and tools for the characterization of fruit tree based systems, often complex (e.g. agroforestry systems). In this context, PixFruit project (UPR HortSys) aims at aquiring data on mango production at both scales of tree and orchard, respectively, for the calibration of regional production models dedicated to provide accurate and reliable analytical statistics to the actors of the sector. In order to extrapolate to the scale of a region the mango production from data collected in the field by PixFruit tools (PixFruit smartapp), it is necessary to delimit and classify the trees and orchards to provide additional input data (cultivated area, planting density, cultivar composition, etc.) to regional models.
We thus propose to explore and evaluate the potential of deep learning in neuronal classification and segmentation methods for the production of this information from multispectral satellite imagery data at very high spatial resolution (Pleiades).
Our first objective is to develop, on the basis of this innovative methodology, tools making it possible to detect, identify and delineate orchards in a production region, and then to discriminate those containing mango trees. Our second objective is to identify and delineate the mango trees themselves, as individual trees, whether isolated or in orchards. It will therefore be necessary to obtain better results in segmentation (orchard circumscription, individual tree cut out) and classification (orchard recognition according to its majority species, orchard classification according to structure and cultivar composition, tree by tree species identification) than with the tools more conventionally used in remote sensing (SVM, RF). In this work, we will evaluate the potential of the two mostly used types of neural networks, on several data architectures, to delimit and classify orchards: convolutional networks (CNN) on two Pleiades images acquired in March and July 2017, and recurrent networks (RNN) on the association of these Pleiades images and a Sentinel-2 time series. Finally, we will analyze the performance of the Mask-RCNN (Regional Convolutional Neural Network) to correctly identify and segment trees.
The study area will be the Niayes in Senegal (503 km2), relevant for its cropping systems diversity comprising different levels of complexity and density (fruit monocultures, extensive systems and agroforestry systems) and the presence of many cultivated species (e.g. mango, citrus, cashew, neem, etc.). In addition, this area benefits from a large field database (11,300 mango trees and 12,211 orchards, resulting from Julien Sarron’s #DigitAg thesis) and the agronomic expertise got within the framework of the PixFruit project, which will allow the technical achievement of this study.
|Characterising architecture and production on an apple tree core-collection based on airborne Lidar scans
In the current context of climatic change and input reduction (fertilizers, water, pesticides, etc.), selecting new performant cultivars of fruit trees is becoming essential. Architectural traits (3D tree structure and leaf distribution) have to be considered to evaluate the intrinsic genotypic potential for fruit production, interactions with environment (light, rains, insects, etc.) and the easiness of tree training. To evaluate such traits with precision and at high-throughput, we are developing an approach based on airborne Lidar scans which allows a rapid data acquisition of a whole orchard in 3D. This choice was made after a first set of experiments in which trees have been scanned during the vegetation period (with leaves and fruits) by terrestrial Lidar. However, this acquisition procedure was tedious and time consuming whilst it has allowed us to deliver first architectural characterisation of the trees of the collection (Coupel et al., 2019) and a first estimation of their production (Artzet et al., 2020). Part of the methods for data analyses are based on machine and deep learning, especially using networks for point treatment, such as RandLa-Net (Hu et al., 2019). Recently, we have decided to revisit the data acquisition procedure and we have performed new scans with airborne LIDAR that are more rapid. However, both resolution and points of view differ between terrestrial and airborne LIDAR making it necessary to adapt the methods for the characterisation and identification of organs (stems, leaves and fruits). On whole orchard 3D scans, identifying individuals trees (which each corresponds to a genotype) and the organs that belong to each of them is a real challenge and is key for automatizing our methodology.
This Master internship will aim at formalizing a pipeline for point analyses that will allow the identification of each tree instance, the characterisation of its shape, foliar density and production by adapting and testing the estimation methods as well as the indicateurs associated with these new data. To do this, it will be necessary to revisit the various stages of the pipelines implemented previously, including the creation of a learning database for which data from the simulation can be used (MappleT model for generating virtual 3D architectural models of apple trees (Costes et al., 2008)). Virtual scanning methods simulating drone acquisition will have to be implemented, as well as transfer learning methods to reproduce the noise observed in the real data. For production estimation, points corresponding to apples could be identified with methods such as RandLa-Net. Nevertheless, the resulting clustering to identify individual apple instances will have to be robust to a low point sampling.
Objective indicators on tree development and production that could be used for genetic analyses, such as GWAS, are expected as the outcome of this work.
|Analysis of simplified architectural 3D representations for improving crop model simulations – application to grapevine
Aerial architecture management is of major importance for optimizing crop functioning since architecture directly determines the amount of intercepted light and the microclimate within the canopy, thus modifying many physiological plant functions (photosynthesis, transpiration, water use efficiency). The challenge of optimizing canopy structure is critical for grapevine as shown by all the training practices used to modify it (winter and summer pruning, trellising…). Functional structural plant models (FSPM) have been developed with the aim of simulating the spatial variability of ecophysiological processes on these 3D architectures. Nevertheless, their use is made difficult by the large amount of data needed for building an exhaustive description of the input architectures. Meanwhile, new high-throughput phenotyping methods (lidar, airborne imaging) are under rapid development and can be used to get new indicators on architectural characteristics of the canopy. Moreover, crop models allow simulating crop yield under different environments but they did not integrate detailed descriptions of plant architecture and of the resulting within-plant variability in functional traits. The objective of this project on grapevine is to determine to which extent simplified representations of aerial architectures could be coupled to crop models in order to better simulate some variables that present large within-plant spatial heterogeneity (light interception, photosynthesis, transpiration). The final aim is to suggest and test new high throughput phenotyping protocols of the architectural features needed to reconstruct these simplified 3D mockups. The research activities will be carried out at UMR LEPSE in Montpellier using existing plant models available under the open-source OpenAlea platform (TopVine, Louarn et al. 2008 ; RATP, Sinoquet et al. 2001; Hydroshoot, Albasha et al. 2019). The first step will be to simulate from already existing 3D data simplified representations similar to what could be acquired under field conditions (lidar, airborne imaging). Simulations of ecophysiological processes will be performed on these simplified 3D structures using different levels and methods for aggregating foliage characteristics (entire plant, voxels…). Results will be compared to simulations performed on detailed 3D representations (consider as references) in order to determine the best trade-off between (1) global error at the plant scale, (2) ability of the method to account for the spatial variability, (3) experimental time needed to get the input data, (4) and computation time. This simplified architecture representation and the associated computational methods could then be used as inputs of crop models in order to test the impact of contrasted training systems on integrative plant variables.
|Characterize the natural environments grazed by remote sensing to assess the sustainability of pastoral livestock in the PACA region
Interactions between pastoral livestock farming and “natural environments” are an important component of the resilience of socio-ecological systems in the hinterland of the Provence-Alpes-Côte-D’azur region. These spaces occupy more than 30% of the land area and livestock farming contributes to the ecological and social dynamics. This activity is strongly intertwined with the multiple use of the spaces with other activities that jointly define the identity of the territories. But the spatial inscription of grazing as well as its relations to the dynamics of land use, contrary to other agricultural practices (ploughing, mowing), is difficult to determine with precision.
However, the intersection between land use and grazing is at the center of concerns both in terms of the capacity to adapt to and mitigate climate change, the control of ecological dynamics with regard to the preservation of biodiversity and the prevention of forest fires. The sustainability of livestock farming itself is directly questioned by the “closure” of pastoral environments that grazing practices are struggling to curb. The medium-term renewal of the fodder resource for grazing herds is threatened as well as the maintenance of these areas in the eligibility criteria for support under the first pillar of the CAP. The characterization of the land use of these environments and their ecological dynamics are thus at the heart of the analysis of the sustainability of the region’s pastoral livestock activity.
The available land-use maps (such as corine land cover …) do not allow to operate in a satisfactory way these crossings. Therefore, we have undertaken a work on :
– On the one hand, a detailed characterization of the uses of pastoral areas based on the data from the pastoral survey carried out by the actors of livestock farming on the scale of the Alps and on farmers’ declarations to the CAP’s graphical parcel register.
– On the other hand, the characterization of land use in these areas using a supervised classification (Bayes) of 5 environmental closure classes based on SPOT 6 images and multi-resolution segmentation (Baatzshape).
The proposed training course aims at adapting this land use characterization methodology to the whole region. This work will produce an easily updatable map allowing the identification of particularly exposed areas and the decision support of the actors of the livestock industry within the framework of public actions aiming at reinforcing the future of the activity.
|Development of an optical sensor for the monitoring of leaf biochemical content: contribution to decision making for the monitoring of the physiological state & early detection of plant pathologies
Climate change has a direct influence on crops, due to its impact on temperatures and water cycle. Monitoring of foliar chemical composition is relevant to study plant physiology, phenology and to identify pathogens in the context of climate change.
Recent advances in both fields of physical modeling of vegetation at the leaf scale , and in optical instrumentation, now make it possible to accurately measure leaf chemical content using portable and low cost sensors. The PROSPECT model allows precisely estimating the main leaf chemical constituents including chlorophyll, carotenoids, anthocyanins, water and dry matter content. The monitoring of this biochemical signature then makes it possible to diagnose the state of the cultures, to detect the stress and the possible presence of certain pathogens at an early stage. Mapping leaf chlorophyll content in the field using image analysis from remote sensing data (drone, airplane, satellites) is used operationally for several years in order to support decision-making tools dedicated to help farmer’s technical itinerary and in and to forecast crop production.
Recent research results show that it is possible to achieve a fine quantification of foliar chemistry using modeling tools and by acquiring parsimonious foliar optical properties in the optical domain. This opens perspectives for the development of a portable leaf-clip sensor using multispectral light emission based on laser diodes and a synchronous detection multi-carrier (patented technology by INRAE ).
The objective of this internship is to participate in the development of a prototype to quantify the foliar chemistry. During the first stage of the internship, the student will have to understand the different optical phenomena involved in the interactions between light and leaf material; he/she will also learn to use the modeling tools . He/she will then determine the combination of wavelengths allowing to optimally estimating foliar chemistry, combining simulated and already available experimental databases. Finally, based on these results, he/she will work on the development of a laboratory prototype. This system will be characterized and validated with experimental data from different crop types. This work will allow enhancing the research results of the TETIS and ITAP joint research units, and may lead to a patent application.
|Visualisation and Navigation in agro-environmental spatio-temporal data classified by relational concept analysis
With the rise of digital technology, agricultural research has produced numerous datasets on agriculture and on the environment to be mobilized to develop decision-making tools for populations from the North and the South. Among these datasets, there is one on the watercourses of two French watersheds developed by the Fresqueau project (http://dataqual.engees.unistra.fr/fresqueau_presentation_gb) which is spatio-temporal and another one on the uses of plants with pesticidal and antibiotic effect developed by the Knomana project (https://agris.fao.org/agris-search/search.do?recordID=FR2019109314) for animal, plant, human and public health whose data model has a ternary structure.
To develop the decision support tool, the classification method used by these projects, to model temporality and ternary relationship, is Relational Concepts Analysis (RCA). Using logical quantifiers, RCA groups and classifies sets of entities sharing common properties and relationships, supporting for example reasoning by exploring properties and similarities, reasoning by abduction to create hypotheses, and the search for alternative solutions by neighborhood with known solutions. To avoid calculating the complete classification to navigate and explore the dataset step by step, an on-demand calculation method has been developed. The problem faced by the team carrying out these projects, i.e. LIRMM, UPR AIDA, UMR IPME and ENGEES, is to have a tool for visualizing and navigating through the data classified by RCA.
Furthermore, the LIRMM conducts research in analytical visualization (Keim et al. 2008). This field focuses on the study of interactive visual interfaces enabling the exploration of complex and heterogeneous datasets in order to facilitate analytical reasoning on the data and thus derive knowledge from them (see for example (Accorsi et al. 2014) developed within the Fresqueau project).
The objective of this internship is to develop a software prototype for the visualization of data sets, including spatial and/or temporal data, classified by RCA. More precisely, the trainee will carry out an interactive visualization allowing to pilot the calculations on request of RCA and to display the results in an incremental way. Several visual approaches will be combined in order to give the user an overview of the extracted knowledge space and, as requested by the user, a detailed view of subsets of the classification calculated on the fly. Different interaction methods (Munzner 2014, chapters 11-14) and different graph visualization techniques (Tamassia 2013) will be used. The trainee will follow the design steps described by Sedlmair et al. 2012: i.e. literature review, definition of the need expressed as visual problem, proposal of a model, development, deployment, validation.
|Distributed plant simulations: Application to agroecology
To meet societal demands for a more sustainable and ecological agriculture, plant models simulating their growth and functioning (FSPM) are being developed by the scientific community. In the framework of the OpenAlea modeling platform, we have been developing for several years, different simulation formalisms (Pradal et al., 2008; Boudon et al., 2012). In particular, formal grammars, such as L-systems, allowing the efficient rewriting of trees or methods for rewriting multi-scale graphs (MTG) are available and have been used to model a wide variety of plants (apple, mango, palm, maize, sorghum, etc.).
FSPM models allow the study and analysis of plant-plant interactions in complex canopies in association (Gaudio et al., 2019, Braghiere et al., 2020). They allow to simulate aerial and root phenotypic plasticity by taking into account the competition for light and resources acquisition in a mechanistic way. For this, it is however necessary to simulate, at the organ scale and in 3D, the development and functioning of a large number of interacting plants within the same canopy. To do this in a reasonable time, it would be necessary to distribute the simulation calculations over large computing infrastructures (cluster, cloud). However, currently, there is neither formalism nor technology to automatically distribute the 3D simulation of interacting heterogeneous plants.
The challenge we are trying to address is therefore to efficiently simulate a set of plants in spatial (competition for resources acquisition) and temporal (feedback between structure and function) interaction. The objective of this project is to analyze different parallelization strategies to simulate in 3D the growth and functioning of plants and stands on shared memory architectures and in distributed environments (Pradal et al., 2017; Heidsieck et al., 2020). One of the challenges is to define design patterns for distributed computations at different granularities (parallel simulation of an isolated plant, distributed computation of a large number of interacting plants) using current technologies (OpenMP, Spark, Dask). An important issue is to take into account the dependencies between the computations made on the structures when they are rewritten according to the strategies used (in place or by copy).
One application of this work will be the simulation of an agroforestry system mixing palm trees and rice for which pre-existing models (VPalm and Cereals projects) will be reused.
The student’s work will consist of :
– Definition of a protocol for spatial information exchange and synchronization between simulators.
– Formalization of a strategy for distributing simulations on several machines or clusters.
– Application to the creation of a Palm-Rice agroforestry system model with characterization of the dynamics of light distribution during a growth cycle.