OpenSILEX & Akene Services: Managing and sharing heterogeneous big data from different sources

OpenSILEX is an open source software suite for the creation of ontology-driven information systems, developed at INRA by computer scientists in the MISTEA research unit in collaboration with agronomists and biologists. Formalised, shared knowledge is used to structure agricultural and environmental research data from different sources. To date, several applications have been developed, one of which, PHIS, is a pilot for high-throughput phenotyping. OpenSILEX is currently in the maturing phase. The team is pursuing the construction of its community at the international level and is expanding with a startup project. An interview with Anne Tireau, Head of Development, Pascal Neveu, Senior Scientist, and Alice Boizet and Morgane Vidal, initiators of the Akene Services project.

The sharing and co-construction of knowledge require the processing of many types of big data to make them more understandable and user-friendly. OpenSILEX is a one-of-a-kind digital tool that proposes methods and tools for the collection, structuring and use of big data from agricultural and environmental sources. The tool can be easily adapted, and concerns both public and private research data.

What is the origin of OpenSILEX ?

[Anne Tireau]: Initially, several information system projects were developed in the unit from 2004 onwards, with successive funding. All of the methods, expertise and tools were used in SILEX (Information System for EXperiment), a collaborative meta-project led by MISTEA. Its goal was to develop information systems for thematic experiments, in collaboration with other units at INRA (LEPSE for plant phenotyping, SPO for oenology, LBE for bioprocesses, etc.) and in connection with other institutes such as LIRMM. Next, with the issue of high-throughput phenotyping and big data, the team developed PHIS. Designed for several experimental platforms in the PHENOME infrastructure (Investissements d’Avenir Programme), it was important that the tool could be adapted to different experimental installations. This software is far more integrated, with adapted ontologies. In this spirit, everything is reusable for other applications, and this is the goal of OpenSILEX which, thanks to PHIS, benefits from developments in the Semantic Web

How does it work ?

[Anne Tireau]: Where research data is concerned, we can briefly say that OpenSILEX contains everything needed to describe and formalise an experiment, the plant material, the equipment, all of the characteristics measured, etc. The goal is to obtain shared, reusable data for future research or other purposes, and to achieve this, the data needs to be described, contextualised, annotated, and linked, while respecting standards. We talk about FAIR data (findability, accessibility, interoperability, and reusability). Everything is available in a cloud-type storage space and managed through an open source software “toolbox”.

[Alice Boizet]: In practice, we register an experiment, we associate its objects (field or greenhouse, plant, sensors, experiment reference) and we describe them. We can include any data measured for these objects through sensors or manual measurements, such as hyperspectral images taken from drones or phenomobiles. For example, the leaf surface, the number of leaves (stage of vegetation), vegetation indices (NDVI), and also any environmental data (temperature, humidity, etc.). We can add event data such as a technical or environmental problem: a camera malfunction, a pest attack on crops, a storm and its effect on the plant. We can then rework this data: annotate it, eliminate anomalies before analysis, etc. Previously, this was all entered into laboratory records and, because of time limitations, the annotations were not always taken into account when interpreting results. Now, algorithms and semantics interconnect objects and complementary information, including a comment or a document, and users can visualise and better understand data or images associated with events or annotations, and thus take account of them in their analyses.

[Morgane Vidal]: At the moment, I manage data from four sources, in other words four high-throughput phenotyping platforms and their specific configurations. These platforms correspond to INRA’s experimental plots across the whole territory, in Mauguio, Toulouse, Clermont-Ferrand and Dijon (INRA-Terres Innovia). The crops are cereals in the field, but the tools used are identical in greenhouses for tropical crops. We could also process private data from seed companies, for example, or, as with a recent project by the team, those of Sun’Agri for the startup Sun’R, in which material data, from photovoltaic panels, were also taken into account.

 

How much data does an experiment produce?

The reanalysis of datasets implies tracking information from thousands of plants, sensors and events.
By way of example, phenotyping data from two experiments on 59 maize hybrids, conducted in field and in greenhouse, were:

  • Field: 10 000 scientific objects, 178 sensors, 10 000 images, 70 000 phenotypic observations, 20 000 annotations and 500 000 environmental measurements
  • Greenhouse: 2 204 scientific objects, 242 sensors, around 2 million images, 10 million phenotypic observations, 15 000 annotations and more than 4 million environmental measurements

Data management and semantics, the core of the tool

[Anne Tireau]: Data management and semantics are central to our work. The ontology-based architecture of OpenSILEX enables the integration of data from multiple experiments and different platforms that are unambiguously identified, retrieved and connected. We formalise specific application ontologies and we interconnect them with existing reference resources. The formats are standardised and open, so that the data contained in an OpenSILEX information system can be exchanged and used by people other than the experiment team, and can therefore be combined with data from other sources. The tool also proposes connections with external resources through Web services (Web API) to export data into other systems, modelling platforms or external databases, for example.

 

The OpenSILEX software suite includes:

  • A user interface (Web)
  • A data and knowledge storage layer (environmental and crop data) based on storage databases (cloud)
  • A Web services layer (simple / complex queries), compatible with community exchange standards (BrAPI)
  • An “intelligent” layer (standard inference engine, expert rules, general and application ontologies, metadata)
  • Connections to a scientific computing and workflow layer
previous arrow
next arrow
previous arrownext arrow
Slider

What are the future challenges?

[Pascal Neveu]: The mass of data to be managed is considerable and there are few systems of this type. OpenSILEX is even one-of-a-kind. There is high demand for our tool and training. We are called upon at the international level, in Australia, the Netherlands (Wageningen University and Research) and Japan (University of Tokyo). The transparency of science is also an important issue, in other words making research data open, and re-usable for future research and other purposes such as teaching and innovation. And, of course, we want to expand the community of developers.

This year we have set up two training courses, one of which is international and has hosted 14 people from seven countries: Japan, Czech Republic, Netherlands, Spain, India, Thailand and France. A second is planned with participants from the United Kingdom, Belgium and China.

There is also room for the creation of a startup. The tool is ready to propose bespoke or off-the-shelf services, but this is not the role of a research team. Now is the time to hand over the reins and to concentrate on the development of new applications, such as that of the French-Chinese ANR project, ANSWER, with the Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences (NIGLAS).

[Morgane Vidal]: There are now a number of existing bodies to maintain, new ones to set up, and bespoke developments planned.

Akene Services, a startup in pre-incubation

Akene, or “Application with Knowledge ENhancement for Experimental data”, could shortly propose bespoke services based on the OpenSILEX toolbox. Its name, a reference to plants and the dispersal of their fruit, is a nod to the dissemination of OpenSILEX.

This startup project brings together Alice Boizet, an agronomist, and Morgane Vidal, a computer scientist, both research engineers on fixed-term contracts at INRA in Montpellier. They are experts in big data sharing. Morgane was recruited as Information System Administrator from the launch of the OpenSILEX project three years ago. And after her AgroTIC specialisation, Alice joined MISTEA for OpenSILEX and the European project AGINFRA+ . In the framework of this thematic hub, she was responsible for setting up a Virtual Research Environment (VRE) in order to facilitate collaboration between researchers in the high-throughput phenotyping community. Once the OpenSILEX project entered the pre-maturing stage, their positioning for the creation of a startup emerged and led them to the final of the Graines d’Agro 2019 competition.

Initially designed for experimental crop data, “OpenSILEX can be adapted to other types of data, whether experimental or not, and this is a strong point, since the tool has no equivalent in terms of adaptability or price”. Morgane adds that “this would be part of our offering, according to the needs of clients, proposing bespoke adaptations, whether simple or more complex with specific developments and functions”. Alice and Morgane are planning a service package, to be developed based on the recurrent needs of their first clients. Their services would include case studies, installation support, data analysis and migration, training for users and developers, the development of specific modules, and maintenance.

Initially, the future clients identified are those with big research data, whether public or private. These include French and foreign research institutes and technology transfer structures. In the private sector, all experimental research and development data also needs be processed; those of agricultural and environmental advisory companies, AgTech startups, etc.

Why create this startup? “We learnt a lot from the Graines d’Agro competition”, says Alice, who stresses that there is a real opportunity with few risks. “We are young, there are possibilities for funding and we also have a personal contribution, so the time is right. We have support from the research team, we have the product, a tool with no equivalent on the market, which can easily be adapted, and the beginnings of a client base, which is reassuring. The OpenSILEX suite can also be adapted to other fields and the market can be expanded in the second phase”.

Alice and Morgane are in the early stages of their startup project: they are coming to the end of their fixed-term contracts while working on their business plan, aiming for incubation in autumn.

 

OpenSILEX in a nutshell

Storing, organising, managing and sharing heterogeneous multidimensional data from numerous sources

Contacts

  • Senior Scientist and Project Manager: pascal neveu [AT] inra.fr and anne.tireau [AT] inra.fr
  • Akene Services : Alice Boizet and Morgane Vidal – akeneservices |AT] gmail.com

Find out more

  • Site:–  opensilex.org– Test: Bac à sable
  • Examples of applications in agriculture:
    • PHIS – High-throughput phenotyping of plants, in field and greenhouse (experimental data)
    • Emphasis – European infrastructure for high-throughput phenotyping of cultivated plants
    • Sun’Agri – Partnership with the startup Sun’R (agrivoltaic vineyard)
  • GitHub community
  • Twitter account
  • Publication : Pascal Neveu, Anne Tireau, Nadine Hilgert, Vincent Nègre, Jonathan Mineau‐Cesari, Nicolas Brichet, Romain Chapuis, Isabelle Sanchez, Cyril Pommier, Brigitte Charnomordic, François Tardieu, Llorenç Cabrera‐Bosquet (2018). Dealing with multi‐source and multi‐scale information in plant phenomics: the ontology‐driven Phenotyping Hybrid Information System. New Phytologist, 221(1),588-601- https://doi.org/10.1111/nph.15385