[PhD’s Corner] Romain Gautron: Reinforcement learning for developing country agriculture: a virtual personal adviser for risk mitigation and multi-objectives optimization

Romain is one of the #DigitAg labelled PhDs

Reinforcement learning for developing country agriculture: a virtual personal adviser for risk mitigation and multi-objectives optimization

  • Start Date:  January 2019
  • University: Montpellier MUSE / Montpellier SupAgro
  • PhD School: GAIA
  • Field(s): Agronomy, Computer sciences
  • Doctoral Thesis Advisor:: Marc Corbeels, Cirad AIDA / Philippe Preux, Inria Lille, Sequel
  • Co-supervisors: Eric Malezieux, Cirad HortSys / Odalric Ambrym Maillard, Inria Lille, Sequel
  • Funding: Cirad – CGIAR
  • #DigitAg:  Labelled PhD – Axis 1: TIC and rural societies, Challenge 5: ICT and new farm advisory services

Keywords: artificial intelligence; reinforcement learning ; small farmers; Southern countries; developing countries; recommendation system; cultural  itineraries; interactive voice server; smartphone; crowdsourcing


Reinforcement learning (RL) has shown spectacular progress in recent years, especially when combined with deep learning for game playing with Deepmind’s AlphaGo (https://deepmind.com/blog/alphago-zero-learning-scratch/). While applied in marketing or pharmaceutical contexts, agriculture remains a poorly explored ground for RL applications. Agriculture is all about sequential decision making under uncertainty, making RL techniques particularly appropriate. With the emergence of bigger data collection at different scales by technologies such as remote sensing, ground-based sensors and real-time customer feedback with smartphone applications or interactive voice response (IVR) systems, a new frame of active and online-learning emerges where RL appears to be increasingly relevant.

Current decision support systems for agriculture use static rules for decision making (if-then-else) employing statistical models complemented with agronomic and farm data, remote sensing, and some finite uses of machine learning outputs. The rules tend to be defined for specific management tasks (e.g. irrigation or fertilization) yet tend to not include the whole sequence of decisions, the profile of the farmer making the decision, optimization for multiple objectives (e.g. economic, environmental, social), or accounting for stochasticity (risk quantification). All of these are important for achieving relevant, secured and personalized agronomic recommendations. Those objectives can be fulfilled thanks to new RL advances.

RL techniques allow to take into account uncertain impact of choices (such as crop management) and stochastic events (such as pests or weather). With model-free methods, problems that are too complex to be explicitly defined (in our case whole-farm planning) can be learned directly by interacting with the environment (i.e. directly by trying an action guessed to be the best in the real world). This is a learning frame of trials and errors: a recommendation is given, then its impact (called the ‘reward’) is assessed and future recommendations adjusted according to past experiences. These techniques can as well learn from historical data (e.g. soil, crop, climate data), linking them to what is called a ‘context’, making learning quicker and richer while quantifying uncertainty related to a recommendation.

First, this PhD aims at designing tailored RL algorithms for agriculture with state-of-the-art techniques to give continuously improving advice to farmers, especially in a developing country context with limited data. By leveraging RL expertise from  INRIA (SEQUEL Research team) combined with agronomists’ knowledge from CIAT, CIRAD and other research-for-agricultural development institutes we aim to build a novel multi-objective (economic, social and environmental) and relevant RL system for personalized and real-time recommendations for agronomic practices with a risk quantification component. This system will be operationalized as a personal virtual adviser queryable via IVR systems or other smartphone applications. This virtual assistant self-learns from earlier experiences (past recommendations followed by feedback from farmers) while relying as well on existing knowledge (historical data). Recommendations are provided according to space, time and individual farmer’s context. For instance, a recommendation would be of the “The choice expected to maximize your objectives is planting maize variety x the third yth week of august with a density of z plants per hectares”.

Second, once the algorithms are tested in silico and ready for use, use cases will be conducted with smallholder farmers in Malawi. Available datasets consist of multi-dimensional data integrating location, normalized difference vegetation index,, precipitation, temperature, soils, land use, and historical crop productivity, as well, Digital Globe archives of sub-meter resolution satellite images, to delineate favorable areas and key limiting factors for maize production for the whole national territory of Malawi. Testing will be conducted through AirTel/Viamo M’Chikumbe interactive voice response-based advisory service which already has 726,000 registered farmers in Malawi. It offers a powerful way to interact directly with individual farmers via their mobile phones and provide tailored guidance across the entire country. If experiments are successful, it is expected that the methods and techniques developed in this PhD could be extended to other project sites in e.g. Colombia, Nigeria, Mexico, and India, that is to say to millions of smallholder farmers.

Contact: r.gautron [AT] cgiar.org

Social Networks: LinkedIn