CLSep 6, 2024Code
Column Vocabulary Association (CVA): semantic interpretation of dataless tablesMargherita Martorana, Xueli Pan, Benno Kruit et al.
Traditional Semantic Table Interpretation (STI) methods rely primarily on the underlying table data to create semantic annotations. This year's SemTab challenge introduced the ``Metadata to KG'' track, which focuses on performing STI by using only metadata information, without access to the underlying data. In response to this new challenge, we introduce a new term: Column Vocabulary Association (CVA). This term refers to the task of semantic annotation of column headers solely based on metadata information. In this study, we evaluate the performance of various methods in executing the CVA task, including a Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) approach, as well as a more traditional similarity approach with SemanticBERT. Our methodology uses a zero-shot setting, with no pretraining or examples passed to the Large Language Models (LLMs), as we aim to avoid a domain-specific setting. We investigate a total of 7 different LLMs, of which three commercial GPT models (i.e. gpt-3.5-turbo-0.125, gpt-4o and gpt-4-turbo) and four open source models (i.e. llama3-80b, llama3-7b, gemma-7b and mixtral-8x7b). We integrate this models with RAG systems, and we explore how variations in temperature settings affect performances. Moreover, we continue our investigation by performing the CVA task utilizing SemanticBERT, analyzing how various metadata information influence its performance. Initial findings indicate that LLMs generally perform well at temperatures below 1.0, achieving an accuracy of 100\% in certain cases. Nevertheless, our investigation also reveal that the nature of the data significantly influences CVA task outcomes. In fact, in cases where the input data and glossary are related (for example by being created by the same organizations) traditional methods appear to surpass the performance of LLMs.
AISep 13, 2024
A RAG Approach for Generating Competency Questions in Ontology EngineeringXueli Pan, Jacco van Ossenbruggen, Victor de Boer et al.
Competency question (CQ) formulation is central to several ontology development and evaluation methodologies. Traditionally, the task of crafting these competency questions heavily relies on the effort of domain experts and knowledge engineers which is often time-consuming and labor-intensive. With the emergence of Large Language Models (LLMs), there arises the possibility to automate and enhance this process. Unlike other similar works which use existing ontologies or knowledge graphs as input to LLMs, we present a retrieval-augmented generation (RAG) approach that uses LLMs for the automatic generation of CQs given a set of scientific papers considered to be a domain knowledge base. We investigate its performance and specifically, we study the impact of different number of papers to the RAG and different temperature setting of the LLM. We conduct experiments using GPT-4 on two domain ontology engineering tasks and compare results against ground-truth CQs constructed by domain experts. Empirical assessments on the results, utilizing evaluation metrics (precision and consistency), reveal that compared to zero-shot prompting, adding relevant domain knowledge to the RAG improves the performance of LLMs on generating CQs for concrete ontology engineering tasks.
DLMar 3, 2022
Nanopublication-Based Semantic Publishing and Reviewing: A Field Study with Formalization PapersCristina-Iulia Bucur, Tobias Kuhn, Davide Ceolin et al.
With the rapidly increasing amount of scientific literature,it is getting continuously more difficult for researchers in different disciplines to be updated with the recent findings in their field of study.Processing scientific articles in an automated fashion has been proposed as a solution to this problem,but the accuracy of such processing remains very poor for extraction tasks beyond the basic ones.Few approaches have tried to change how we publish scientific results in the first place,by making articles machine-interpretable by expressing them with formal semantics from the start.In the work presented here,we set out to demonstrate that we can formally publish high-level scientific claims in formal logic,and publish the results in a special issue of an existing journal.We use the concept and technology of nanopublications for this endeavor,and represent not just the submissions and final papers in this RDF-based format,but also the whole process in between,including reviews,responses,and decisions.We do this by performing a field study with what we call formalization papers,which contribute a novel formalization of a previously published claim.We received 15 submissions from 18 authors,who then went through the whole publication process leading to the publication of their contributions in the special issue.Our evaluation shows the technical and practical feasibility of our approach.The participating authors mostly showed high levels of interest and confidence,and mostly experienced the process as not very difficult,despite the technical nature of the current user interfaces.We believe that these results indicate that it is possible to publish scientific results from different fields with machine-interpretable semantics from the start,which in turn opens countless possibilities to radically improve in the future the effectiveness and efficiency of the scientific endeavor as a whole.
IRSep 1, 2022
Hidden Author Bias in Book RecommendationSavvina Daniil, Mirjam Cuper, Cynthia C. S. Liem et al.
Collaborative filtering algorithms have the advantage of not requiring sensitive user or item information to provide recommendations. However, they still suffer from fairness related issues, like popularity bias. In this work, we argue that popularity bias often leads to other biases that are not obvious when additional user or item information is not provided to the researcher. We examine our hypothesis in the book recommendation case on a commonly used dataset with book ratings. We enrich it with author information using publicly available external sources. We find that popular books are mainly written by US citizens in the dataset, and that these books tend to be recommended disproportionally by popular collaborative filtering algorithms compared to the users' profiles. We conclude that the societal implications of popularity bias should be further examined by the scholar community.
CLNov 13, 2023
How Contentious Terms About People and Cultures are Used in Linked Open DataAndrei Nesterov, Laura Hollink, Jacco van Ossenbruggen
Web resources in linked open data (LOD) are comprehensible to humans through literal textual values attached to them, such as labels, notes, or comments. Word choices in literals may not always be neutral. When outdated and culturally stereotyping terminology is used in literals, they may appear as offensive to users in interfaces and propagate stereotypes to algorithms trained on them. We study how frequently and in which literals contentious terms about people and cultures occur in LOD and whether there are attempts to mark the usage of such terms. For our analysis, we reuse English and Dutch terms from a knowledge graph that provides opinions of experts from the cultural heritage domain about terms' contentiousness. We inspect occurrences of these terms in four widely used datasets: Wikidata, The Getty Art & Architecture Thesaurus, Princeton WordNet, and Open Dutch WordNet. Some terms are ambiguous and contentious only in particular senses. Applying word sense disambiguation, we generate a set of literals relevant to our analysis. We found that outdated, derogatory, stereotyping terms frequently appear in descriptive and labelling literals, such as preferred labels that are usually displayed in interfaces and used for indexing. In some cases, LOD contributors mark contentious terms with words and phrases in literals (implicit markers) or properties linked to resources (explicit markers). However, such marking is rare and non-consistent in all datasets. Our quantitative and qualitative insights could be helpful in developing more systematic approaches to address the propagation of stereotypes via LOD.
6.2CVMay 6
From Historical Tabular Image to Knowledge Graphs: A Provenance-Aware Modular PipelineSarah Binta Alam Shoilee, Victor de Boer, Jacco van Ossenbruggen et al.
Handwritten archival tables contain rich historical information, yet transforming them into structured representations, such as Knowledge Graphs, requires integrating table structure recognition, handwriting recognition, and semantic interpretation - a complex multimodal process. End-to-end AI implementations can obscure these steps, resulting in opaque algorithmic operations that hinder human oversight, critical assessment, and trust. To address this, we present a modular, provenance-aware pipeline to convert handwritten tabular images into KGs supporting human-AI collaboration. The pipeline decomposes the workflow into three stages - table reconstruction, information extraction, and KG construction - while exposing intermediate representations for inspection, evaluation, and correction. A key contribution of our approach is the systematic integration of data provenance at every stage, ensuring that all extracted entities and literals remain traceable to their visual and textual origins. The proposed pipeline is demonstrated through a number of experiments on real-world archival material concerning military careers. The results across three different table reconstruction variants highlight the importance of modularisation. By coupling modularity with data provenance, our work advances transparent and collaboratively controllable image-to-KG pipelines for complex historical data.
DBMar 1, 2024
Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata EnrichmentMargherita Martorana, Tobias Kuhn, Lise Stork et al.
Traditional dataset retrieval systems rely on metadata for indexing, rather than on the underlying data values. However, high-quality metadata creation and enrichment often require manual annotations, which is a labour-intensive and challenging process to automate. In this study, we propose a method to support metadata enrichment using topic annotations generated by three Large Language Models (LLMs): ChatGPT-3.5, GoogleBard, and GoogleGemini. Our analysis focuses on classifying column headers based on domain-specific topics from the Consortium of European Social Science Data Archives (CESSDA), a Linked Data controlled vocabulary. Our approach operates in a zero-shot setting, integrating the controlled topic vocabulary directly within the input prompt. This integration serves as a Large Context Windows approach, with the aim of improving the results of the topic classification task. We evaluated the performance of the LLMs in terms of internal consistency, inter-machine alignment, and agreement with human classification. Additionally, we investigate the impact of contextual information (i.e., dataset description) on the classification outcomes. Our findings suggest that ChatGPT and GoogleGemini outperform GoogleBard in terms of internal consistency as well as LLM-human-agreement. Interestingly, we found that contextual information had no significant impact on LLM performance. This work proposes a novel approach that leverages LLMs for topic classification of column headers using a controlled vocabulary, presenting a practical application of LLMs and Large Context Windows within the Semantic Web domain. This approach has the potential to facilitate automated metadata enrichment, thereby enhancing dataset retrieval and the Findability, Accessibility, Interoperability, and Reusability (FAIR) of research data on the Web.
HCMar 20, 2024
Analysing and Organising Human Communications for AI Fairness-Related Decisions: Use Cases from the Public SectorMirthe Dankloff, Vanja Skoric, Giovanni Sileno et al.
AI algorithms used in the public sector, e.g., for allocating social benefits or predicting fraud, often involve multiple public and private stakeholders at various phases of the algorithm's life-cycle. Communication issues between these diverse stakeholders can lead to misinterpretation and misuse of algorithms. We investigate the communication processes for AI fairness-related decisions by conducting interviews with practitioners working on algorithmic systems in the public sector. By applying qualitative coding analysis, we identify key elements of communication processes that underlie fairness-related human decisions. We analyze the division of roles, tasks, skills, and challenges perceived by stakeholders. We formalize the underlying communication issues within a conceptual framework that i. represents the communication patterns ii. outlines missing elements, such as actors who miss skills for their tasks. The framework is used for describing and analyzing key organizational issues for fairness-related decisions. Three general patterns emerge from the analysis: 1. Policy-makers, civil servants, and domain experts are less involved compared to developers throughout a system's life-cycle. This leads to developers taking on extra roles such as advisor, while they potentially miss the required skills and guidance from domain experts. 2. End-users and policy-makers often lack the technical skills to interpret a system's limitations, and rely on developer roles for making decisions concerning fairness issues. 3. Citizens are structurally absent throughout a system's life-cycle, which may lead to decisions that do not include relevant considerations from impacted stakeholders.
AIAug 14, 2025
FIRESPARQL: A LLM-based Framework for SPARQL Query Generation over Scholarly Knowledge GraphsXueli Pan, Victor de Boer, Jacco van Ossenbruggen
Question answering over Scholarly Knowledge Graphs (SKGs) remains a challenging task due to the complexity of scholarly content and the intricate structure of these graphs. Large Language Model (LLM) approaches could be used to translate natural language questions (NLQs) into SPARQL queries; however, these LLM-based approaches struggle with SPARQL query generation due to limited exposure to SKG-specific content and the underlying schema. We identified two main types of errors in the LLM-generated SPARQL queries: (i) structural inconsistencies, such as missing or redundant triples in the queries, and (ii) semantic inaccuracies, where incorrect entities or properties are shown in the queries despite a correct query structure. To address these issues, we propose FIRESPARQL, a modular framework that supports fine-tuned LLMs as a core component, with optional context provided via retrieval-augmented generation (RAG) and a SPARQL query correction layer. We evaluate the framework on the SciQA Benchmark using various configurations (zero-shot, zero-shot with RAG, one-shot, fine-tuning, and fine-tuning with RAG) and compare the performance with baseline and state-of-the-art approaches. We measure query accuracy using BLEU and ROUGE metrics, and query result accuracy using relaxed exact match(RelaxedEM), with respect to the gold standards containing the NLQs, SPARQL queries, and the results of the queries. Experimental results demonstrate that fine-tuning achieves the highest overall performance, reaching 0.90 ROUGE-L for query accuracy and 0.85 RelaxedEM for result accuracy on the test set.
DLSep 27, 2021
Expressing High-Level Scientific Claims with Formal SemanticsCristina-Iulia Bucur, Tobias Kuhn, Davide Ceolin et al.
The use of semantic technologies is gaining significant traction in science communication with a wide array of applications in disciplines including the Life Sciences, Computer Science, and the Social Sciences. Languages like RDF, OWL, and other formalisms based on formal logic are applied to make scientific knowledge accessible not only to human readers but also to automated systems. These approaches have mostly focused on the structure of scientific publications themselves, on the used scientific methods and equipment, or on the structure of the used datasets. The core claims or hypotheses of scientific work have only been covered in a shallow manner, such as by linking mentioned entities to established identifiers. In this research, we therefore want to find out whether we can use existing semantic formalisms to fully express the content of high-level scientific claims using formal semantics in a systematic way. Analyzing the main claims from a sample of scientific articles from all disciplines, we find that their semantics are more complex than what a straight-forward application of formalisms like RDF or OWL account for, but we managed to elicit a clear semantic pattern which we call the 'super-pattern'. We show here how the instantiation of the five slots of this super-pattern leads to a strictly defined statement in higher-order logic. We successfully applied this super-pattern to an enlarged sample of scientific claims. We show that knowledge representation experts, when instructed to independently instantiate the super-pattern with given scientific claims, show a high degree of consistency and convergence given the complexity of the task and the subject. These results therefore open the door for expressing high-level scientific findings in a manner they can be automatically interpreted, which on the longer run can allow us to do automated consistency checking, and much more.
CLOct 25, 2019
Is it a Fruit, an Apple or a Granny Smith? Predicting the Basic Level in a Concept HierarchyLaura Hollink, Aysenur Bilgin, Jacco van Ossenbruggen
The "basic level", according to experiments in cognitive psychology, is the level of abstraction in a hierarchy of concepts at which humans perform tasks quicker and with greater accuracy than at other levels. We argue that applications that use concept hierarchies - such as knowledge graphs, ontologies or taxonomies - could significantly improve their user interfaces if they `knew' which concepts are the basic level concepts. This paper examines to what extent the basic level can be learned from data. We test the utility of three types of concept features, that were inspired by the basic level theory: lexical features, structural features and frequency features. We evaluate our approach on WordNet, and create a training set of manually labelled examples that includes concepts from different domains. Our findings include that the basic level concepts can be accurately identified within one domain. Concepts that are difficult to label for humans are also harder to classify automatically. Our experiments provide insight into how classification performance across domains could be improved, which is necessary for identification of basic level concepts on a larger scale.
CLOct 1, 2018
Utilizing a Transparency-driven Environment toward Trusted Automatic Genre Classification: A Case Study in Journalism HistoryAysenur Bilgin, Laura Hollink, Jacco van Ossenbruggen et al.
With the growing abundance of unlabeled data in real-world tasks, researchers have to rely on the predictions given by black-boxed computational models. However, it is an often neglected fact that these models may be scoring high on accuracy for the wrong reasons. In this paper, we present a practical impact analysis of enabling model transparency by various presentation forms. For this purpose, we developed an environment that empowers non-computer scientists to become practicing data scientists in their own research field. We demonstrate the gradually increasing understanding of journalism historians through a real-world use case study on automatic genre classification of newspaper articles. This study is a first step towards trusted usage of machine learning pipelines in a responsible way.