PhenoHarmonIS Workshop 2024 highlights: Data & Dialogue: Agrifood semantics in the Age of Artificial Intelligence

From 27 May 2024 to 30 May 2024

Institut Agro Montpellier, France

PhenoHarmonIS Workshop 2024: Data & Dialogue: Agrifood semantics in the Age of Artificial Intelligence

The workshop, held from May 27-30, 2024, at Institut Agro, Montpellier, France, brought together approximately 85 participants working in 16 countries from Africa, Asia, Europe, Latin America, North America and Pacific, representing 29 institutions or companies. Through a comprehensive quality program, the event featured 45 speakers, 12 tool demonstrations, and 5 poster displays. It provided an excellent venue for participants to establish and define collaborations.
Presentations of the workshop are available here

Historically, in its previous editions, the PhenoHarmonIS workshop has focused on the role of agricultural semantics in the FAIR principles. With rapid advances in natural language processing (NLP), this focus now extends to innovative Artificial Intelligence (AI)I-driven digital tools. AI technologies often rely on extensive data encapsulated in large language models (LLMs), which may lack precision in specialized areas such as agri-food research.

After sharing success stories in the application of semantics and FAIR principles for agricultural data, keynote speakers introduced perspectives on machine learning in agriculture and examples in the human health domain. Participants enthusiastically shared in plenary sessions their work on AI-based technologies and the use of semantics in various domains of food and agriculture research.

Presented topics collectively highlighted the importance of semantic resources, FAIR data practices, and responsible data linkage in the context of agriculture research, with a focus on the successful integration and application of AI technologies. There is a shift towards data-centric AI in agriculture that requires high-quality data in machine learning, and the innovative use of multimodal frameworks and spatio-temporal data to drive new applications in the field.

The innovative integration of Knowledge Graphs (KG) and Machine Learning (ML) appeared key to improve data and semantic understanding, and enhance predictive capabilities, with specific applications in healthcare and agriculture. Given the role that semantics can play in improving the results of decision making tools by reducing inaccurate answers, AI hallucinations, alleviate missing data, avoid bias, it was important to discuss responsible and ethical practices, with an emphasis on the social and epistemological conditions under which data standards are produced and applied, and the value of a process-sensitive approach to designing and maintaining semantic standards.

The workshop highlighted various existing semantic resources (e.g. thesauri like Agrovoc, ontologies dedicated to crops, agronomy, fisheries, food products qualities, food packaging, etc) and tools/apps (e.g. OntoGPT, Agroportal, NLP for creating Knowledge Graph, LLM supported data management, Plant disease detection apps, climate models, etc) dedicated to data in Agriculture and Food Systems and that provides a solid foundation for creating efficient dialogue between users and data. AI-technologies are also tested to create and maintain the ontologies (e.g., DRAGONAI)

One objective of the organizers was to collect the answers of speakers regarding two main questions:

What is your short definition of AI-ready data in the context of your project?
How could semantics help make AI-ready data?

Artificial Intelligence-based technologies confront what humans face when looking at datasets: ambiguity as a barrier to comparability and interpretation. To succeed, AI must have access to contextual information about the question, including language and provenance—how that data came to be, its method of creation or collection, its precision, its time frame. These are all required to make data comparable.

To be ready for AI consumption, data needs to be open, findable, accessible, interoperable, reusable (FAIR), and particularly of high quality to strengthen Machine Learning models. According to the speakers’ answers, AI-ready data should ideally comply to the following list of criteria:

Pre-processed and organized
Accurate and consistent
Responsibly sourced
Traceable
Standardized and interoperable
Machine-readable and actionable
Contextualized
Unambiguous so it can be compared without the need for translation
Validated - Ready for analysis and use by models
Representative of real life (external validity)
Trustworthy and ethical, respecting the CARE principles
Representing stakeholders’ interests and values with governance

The role of semantics, from vocabularies to semantically organized ontologies, was highlighted as crucial to achieving this quality level. Semantics enable the development of machine-readable knowledge graphs integrating human knowledge representations and context, thus supporting AI deductive inference and reasoning. Semantics can include stakeholders' interests and values in the management and structure of data/metadata.

Large Language Models can determine the correct meaning of a term based on the context of the question and data, thereby improving the accuracy of the answer. Ontologies help disambiguate terms that have multiple meanings. It can be used to validate the results generated by LLMs by checking the answers against ontological knowledge and correcting them if they deviate from established facts and relationships. Therefore, semantic resources should be (a) embedded into the AI learning process and (b) used to validate the results.

AI-ready data is data that conveys enough information about itself (metadata) such that a machine, without any a priori knowledge of data structures and content, can make useful decisions on what to do with the data (inference on content) with the least ambiguity. Early knowledge integration can guide the model's learning stage, and enables (a) discarding useless information, and (b) identifying inconsistencies to guide data cleaning.

In the context of agri-food research, where for example digital twins are developed to model real situations, an ecosystem of ontologies that define relationships between different crop types, crop traits, soil conditions, field and water management practices, farmers’ preferences and climate factors will allows AI models to better understand and predict crop performances (e.g. Yield, marketability) based on a range of interrelated factors, leading to more accurate and actionable insights. Several of these ontologies developed for the Life Science and the Agrifood domain are open and FAIR, and can be integrated into models.

By leveraging semantic technologies, researchers can ensure their data is not only AI-ready but also more valuable and insightful for a wide range of applications.

Click HERE to look at PhenoHarmonIS 2024 agenda

Watch here the PhenoHarmonIS Workshop 2024 sessions:
To see the whole playlist, click on the icon in the upper right corner of the video hereunder

This event was sponsored by:

Projects represented at the workshop:

Contact: contact [AT] hdigitag.fr

Modification date: 12 August 2024 | Publication date: 31 January 2024

Name of the cookie	Purpose	Shelf life
CAS and PHP session cookies	Login credentials, session security	Session
Tarteaucitron	Saving your cookie consent choices	12 months

Name of the cookie	Purpose	Shelf life
atid	Trace the visitor's route in order to establish visit statistics.	13 months
atuserid	Store the anonymous ID of the visitor who starts the first time he visits the site	13 months
atidvisitor	Identify the numbers (unique identifiers of a site) seen by the visitor and store the visitor's identifiers.	13 months