AISep 29, 2023
Knowledge Graphs for the Life Sciences: Recent Developments, Challenges and OpportunitiesJiaoyan Chen, Hang Dong, Janna Hastings et al.
The term life sciences refers to the disciplines that study living organisms and life processes, and include chemistry, biology, medicine, and a range of other related disciplines. Research efforts in life sciences are heavily data-driven, as they produce and consume vast amounts of scientific data, much of which is intrinsically relational and graph-structured. The volume of data and the complexity of scientific concepts and relations referred to therein promote the application of advanced knowledge-driven technologies for managing and interpreting data, with the ultimate aim to advance scientific discovery. In this survey and position paper, we discuss recent developments and advances in the use of graph-based technologies in life sciences and set out a vision for how these technologies will impact these fields into the future. We focus on three broad topics: the construction and management of Knowledge Graphs (KGs), the use of KGs and associated technologies in the discovery of new knowledge, and the use of KGs in artificial intelligence applications to support explanations (explainable AI). We select a few exemplary use cases for each topic, discuss the challenges and open research questions within these topics, and conclude with a perspective and outlook that summarizes the overarching challenges and their potential solutions as a guide for future research.
68.8AIApr 1Code
IDEA2: Expert-in-the-loop competency question elicitation for collaborative ontology engineeringElliott Watkiss-Leek, Reham Alharbi, Harry Rostron et al.
Competency question (CQ) elicitation represents a critical but resource-intensive bottleneck in ontology engineering. This foundational phase is often hampered by the communication gap between domain experts, who possess the necessary knowledge, and ontology engineers, who formalise it. This paper introduces IDEA2, a novel, semi-automated workflow that integrates Large Language Models (LLMs) within a collaborative, expert-in-the-loop process to address this challenge. The methodology is characterised by a core iterative loop: an initial LLM-based extraction of CQs from requirement documents, a co-creational review and feedback phase by domain experts on an accessible collaborative platform, and an iterative, feedback-driven reformulation of rejected CQs by an LLM until consensus is achieved. To ensure transparency and reproducibility, the entire lifecycle of each CQ is tracked using a provenance model that captures the full lineage of edits, anonymised feedback, and generation parameters. The workflow was validated in 2 real-world scenarios (scientific data, cultural heritage), demonstrating that IDEA2 can accelerate the requirements engineering process, improve the acceptance and relevance of the resulting CQs, and exhibit high usability and effectiveness among domain experts. We release all code and experiments at https://github.com/KE-UniLiv/IDEA2
AIJun 5, 2022
OntoMerger: An Ontology Integration Library for Deduplicating and Connecting Knowledge Graph NodesDavid Geleta, Andriy Nikolov, Mark ODonoghue et al.
Duplication of nodes is a common problem encountered when building knowledge graphs (KGs) from heterogeneous datasets, where it is crucial to be able to merge nodes having the same meaning. OntoMerger is a Python ontology integration library whose functionality is to deduplicate KG nodes. Our approach takes a set of KG nodes, mappings and disconnected hierarchies and generates a set of merged nodes together with a connected hierarchy. In addition, the library provides analytic and data testing functionalities that can be used to fine-tune the inputs, further reducing duplication, and to increase connectivity of the output graph. OntoMerger can be applied to a wide variety of ontologies and KGs. In this paper we introduce OntoMerger and illustrate its functionality on a real-world biomedical KG.
AINov 9, 2023
An Experiment in Retrofitting Competency Questions for Existing OntologiesReham Alharbi, Valentina Tamma, Floriana Grasso et al.
Competency Questions (CQs) are a form of ontology functional requirements expressed as natural language questions. Inspecting CQs together with the axioms in an ontology provides critical insights into the intended scope and applicability of the ontology. CQs also underpin a number of tasks in the development of ontologies e.g. ontology reuse, ontology testing, requirement specification, and the definition of patterns that implement such requirements. Although CQs are integral to the majority of ontology engineering methodologies, the practice of publishing CQs alongside the ontological artefacts is not widely observed by the community. In this context, we present an experiment in retrofitting CQs from existing ontologies. We propose RETROFIT-CQs, a method to extract candidate CQs directly from ontologies using Generative AI. In the paper we present the pipeline that facilitates the extraction of CQs by leveraging Large Language Models (LLMs) and we discuss its application to a number of existing ontologies.
43.0AIMay 21
Knowledge Graph Re-engineering Along the Ontological Continuum (extended version)Enrico Daga, Valentina Tamma, Terry Payne
Knowledge graphs have become the primary vehicle for data integration and are critical to the success of modern AI, but the diversity of KG modelling practices, from lightweight vocabularies to richly axiomatised ontologies, makes integration and reuse expensive and brittle. This challenge is particularly acute in neuro-symbolic AI, where bridging neural and symbolic components depends on the ability to reengineer KGs to fit new requirements; GenAI now offers unprecedented automation capability, but without a principled understanding of the KG space, such automation remains conceptually ungrounded. We introduce the ontological continuum as that missing conceptualisation, a theoretical construct a theoretical construct whose characterisation framework is defined by two orthogonal distinctions: semantics vs pragmatics, and properties vs affordances; together these define a vocabulary to describe, compare, navigate, and transform KGs across the full range of modelling practices. The methodological stance is empirical: rather than prescribing how KGs should be modelled, the continuum aims to define a theory of the existent, derived from observation of real-world KG engineering practices and whose structure can be made formally explicit, for example, through Formal Concept Analysis (FCA). We ground the vision through a case study on provenance knowledge, showing how a single concern manifests differently across the continuum. We articulate five open research challenges and invite the community to develop the ontological continuum as a shared research agenda.
37.4AIApr 2
The AnIML Ontology: Enabling Semantic Interoperability for Large-Scale Experimental Data in Interconnected Scientific LabsWilf Morlidge, Elliott Watkiss-Leek, George Hannah et al.
Achieving semantic interoperability across heterogeneous experimental data systems remains a major barrier to data-driven scientific discovery. The Analytical Information Markup Language (AnIML), a flexible XML-based standard for analytical chemistry and biology, is increasingly used in industrial R&D labs for managing and exchanging experimental data. However, the expressivity of the XML schema permits divergent interpretations across stakeholders, introducing inconsistencies that undermine the interoperability the AnIML schema was designed to support. In this paper, we present the AnIML Ontology, an OWL 2 ontology that formalises the semantics of AnIML and aligns it with the Allotrope Data Format to support future cross-system and cross-lab interoperability. The ontology was developed using an expert-in-the-loop approach combining LLM-assisted requirement elicitation with collaborative ontology engineering. We validate the ontology through a multi-layered approach: data-driven transformation of real-world AnIML files into knowledge graphs, competency question verification via SPARQL, and a novel validation protocol based on adversarial negative competency questions mapped to established ontological anti-patterns and enforced via SHACL constraints.
51.5AIMay 18
Discoverable Agent Knowledge -- A Formal Framework for Agentic KG Affordances (Extended Version)Terry R. Payne, Valentina Tamma, Enrico Daga
Two decades ago, the Semantic Web Services community was asked how agents with different ontological commitments could discover, compose, and invoke web services coherently. The response was OWL-S and WSMO: formally grounded capability descriptions specifying what a service could do, what the agent must already know for invocation to be epistemically sound, and how ontological mismatches could be formally bridged. Current Knowledge Graph (KG) metadata standards such as VoID and DCAT describe what a KG contains yet say nothing about what a specific agent can prove from it, what closure assumptions govern empty results, or whether the agent's task vocabulary is grounded in the schema. Furthermore, in deployed KGs the governing schema DL and the operative entailment regime can diverge: an epistemic failure mode invisible to current metadata. We revisit and extend these insights for the KG setting with a four-dimensional formal framework from which we derive the Agentic Affordance Profile (AAP): a semantic layer above VoID and DCAT enabling principled KG selection, composition, and failure diagnosis at agent planning time. A five-point research agenda identifies the formal, computational, and engineering work needed to realise AAP-based affordance matching at scale.
59.3AIApr 17
Characterising LLM-Generated Competency Questions: a Cross-Domain Empirical Study using Open and Closed ModelsReham Alharbi, Valentina Tamma, Terry R. Payne et al.
Competency Questions (CQs) are a cornerstone of requirement elicitation in ontology engineering. CQs represent requirements as a set of natural language questions that an ontology should satisfy; they are traditionally modelled by ontology engineers together with domain experts as part of a human-centred, manual elicitation process. The use of Generative AI automates CQ creation at scale, therefore democratising the process of generation, widening stakeholder engagement, and ultimately broadening access to ontology engineering. However, given the large and heterogeneous landscape of LLMs, varying in dimensions such as parameter scale, task and domain specialisation, and accessibility, it is crucial to characterise and understand the intrinsic, observable properties of the CQs they produce (e.g., readability, structural complexity) through a systematic, cross-domain analysis. This paper introduces a set of quantitative measures for the systematic comparison of CQs across multiple dimensions. Using CQs generated from well defined use cases and scenarios, we identify their salient properties, including readability, relevance with respect to the input text and structural complexity of the generated questions. We conduct our experiments over a set of use cases and requirements using a range of LLMs, including both open (KimiK2-1T, LLama3.1-8B, LLama3.2-3B) and closed models (Gemini 2.5 Pro, GPT 4.1). Our analysis demonstrates that LLM performance reflects distinct generation profiles shaped by the use case.
AIJul 4, 2025
RELRaE: LLM-Based Relationship Extraction, Labelling, Refinement, and EvaluationGeorge Hannah, Jacopo de Berardinis, Terry R. Payne et al.
A large volume of XML data is produced in experiments carried out by robots in laboratories. In order to support the interoperability of data between labs, there is a motivation to translate the XML data into a knowledge graph. A key stage of this process is the enrichment of the XML schema to lay the foundation of an ontology schema. To achieve this, we present the RELRaE framework, a framework that employs large language models in different stages to extract and accurately label the relationships implicitly present in the XML schema. We investigate the capability of LLMs to accurately generate these labels and then evaluate them. Our work demonstrates that LLMs can be effectively used to support the generation of relationship labels in the context of lab automation, and that they can play a valuable role within semi-automatic ontology generation frameworks more generally.
CLJul 1, 2025
A Comparative Study of Competency Question Elicitation Methods from Ontology RequirementsReham Alharbi, Valentina Tamma, Terry R. Payne et al.
Competency Questions (CQs) are pivotal in knowledge engineering, guiding the design, validation, and testing of ontologies. A number of diverse formulation approaches have been proposed in the literature, ranging from completely manual to Large Language Model (LLM) driven ones. However, attempts to characterise the outputs of these approaches and their systematic comparison are scarce. This paper presents an empirical comparative evaluation of three distinct CQ formulation approaches: manual formulation by ontology engineers, instantiation of CQ patterns, and generation using state of the art LLMs. We generate CQs using each approach from a set of requirements for cultural heritage, and assess them across different dimensions: degree of acceptability, ambiguity, relevance, readability and complexity. Our contribution is twofold: (i) the first multi-annotator dataset of CQs generated from the same source using different methods; and (ii) a systematic comparison of the characteristics of the CQs resulting from each approach. Our study shows that different CQ generation approaches have different characteristics and that LLMs can be used as a way to initially elicit CQs, however these are sensitive to the model used to generate CQs and they generally require a further refinement step before they can be used to model requirements.
CLApr 8, 2025
Evaluating the Fitness of Ontologies for the Task of Question GenerationSamah Alkhuzaey, Floriana Grasso, Terry R. Payne et al.
Ontology-based question generation is an important application of semantic-aware systems that enables the creation of large question banks for diverse learning environments. The effectiveness of these systems, both in terms of the calibre and cognitive difficulty of the resulting questions, depends heavily on the quality and modelling approach of the underlying ontologies, making it crucial to assess their fitness for this task. To date, there has been no comprehensive investigation into the specific ontology aspects or characteristics that affect the question generation process. Therefore, this paper proposes a set of requirements and task-specific metrics for evaluating the fitness of ontologies for question generation tasks in pedagogical settings. Using the ROMEO methodology (a structured framework used for identifying task-specific metrics), a set of evaluation metrics have been derived from an expert assessment of questions generated by a question generation model. To validate the proposed metrics, we apply them to a set of ontologies previously used in question generation to illustrate how the metric scores align with and complement findings reported in earlier studies. The analysis confirms that ontology characteristics significantly impact the effectiveness of question generation, with different ontologies exhibiting varying performance levels. This highlights the importance of assessing ontology quality with respect to Automatic Question Generation (AQG) tasks.
AIOct 17, 2024
A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal EntitiesGianluca Apriceno, Valentina Tamma, Tania Bailoni et al.
The ability to reason with and integrate different sensory inputs is the foundation underpinning human intelligence and it is the reason for the growing interest in modelling multi-modal information within Knowledge Graphs. Multi-Modal Knowledge Graphs extend traditional Knowledge Graphs by associating an entity with its possible modal representations, including text, images, audio, and videos, all of which are used to convey the semantics of the entity. Despite the increasing attention that Multi-Modal Knowledge Graphs have received, there is a lack of consensus about the definitions and modelling of modalities, whose definition is often determined by application domains. In this paper, we propose a novel ontology design pattern that captures the separation of concerns between an entity (and the information it conveys), whose semantics can have different manifestations across different media, and its realisation in terms of a physical information entity. By introducing this abstract model, we aim to facilitate the harmonisation and integration of different existing multi-modal ontologies which is crucial for many intelligent applications across different domains spanning from medicine to digital humanities.
MAFeb 5, 2022
Governance of Autonomous Agents on the Web: Challenges and OpportunitiesTimotheus Kampik, Adnane Mansour, Olivier Boissier et al.
The study of autonomous agents has a long tradition in the Multiagent Systems and the Semantic Web communities, with applications ranging from automating business processes to personal assistants. More recently, the Web of Things (WoT), which is an extension of the Internet of Things (IoT) with metadata expressed in Web standards, and its community provide further motivation for pushing the autonomous agents research agenda forward. Although representing and reasoning about norms, policies and preferences is crucial to ensuring that autonomous agents act in a manner that satisfies stakeholder requirements, normative concepts, policies and preferences have yet to be considered as first-class abstractions in Web-based multiagent systems. Towards this end, this paper motivates the need for alignment and joint research across the Multiagent Systems, Semantic Web, and WoT communities, introduces a conceptual framework for governance of autonomous agents on the Web, and identifies several research challenges and opportunities.