Lise Stork

AI
h-index31
3papers
48citations
Novelty27%
AI Score18

3 Papers

AIOct 1, 2023
Knowledge Engineering using Large Language Models

Bradley P. Allen, Lise Stork, Paul Groth

Knowledge engineering is a discipline that focuses on the creation and maintenance of processes that generate and apply knowledge. Traditionally, knowledge engineering approaches have focused on knowledge expressed in formal languages. The emergence of large language models and their capabilities to effectively work with natural language, in its broadest sense, raises questions about the foundations and practice of knowledge engineering. Here, we outline the potential role of LLMs in knowledge engineering, identifying two central directions: 1) creating hybrid neuro-symbolic knowledge systems; and 2) enabling knowledge engineering in natural language. Additionally, we formulate key open research questions to tackle these directions.

DBMar 1, 2024
Zero-Shot Topic Classification of Column Headers: Leveraging LLMs for Metadata Enrichment

Margherita Martorana, Tobias Kuhn, Lise Stork et al.

Traditional dataset retrieval systems rely on metadata for indexing, rather than on the underlying data values. However, high-quality metadata creation and enrichment often require manual annotations, which is a labour-intensive and challenging process to automate. In this study, we propose a method to support metadata enrichment using topic annotations generated by three Large Language Models (LLMs): ChatGPT-3.5, GoogleBard, and GoogleGemini. Our analysis focuses on classifying column headers based on domain-specific topics from the Consortium of European Social Science Data Archives (CESSDA), a Linked Data controlled vocabulary. Our approach operates in a zero-shot setting, integrating the controlled topic vocabulary directly within the input prompt. This integration serves as a Large Context Windows approach, with the aim of improving the results of the topic classification task. We evaluated the performance of the LLMs in terms of internal consistency, inter-machine alignment, and agreement with human classification. Additionally, we investigate the impact of contextual information (i.e., dataset description) on the classification outcomes. Our findings suggest that ChatGPT and GoogleGemini outperform GoogleBard in terms of internal consistency as well as LLM-human-agreement. Interestingly, we found that contextual information had no significant impact on LLM performance. This work proposes a novel approach that leverages LLMs for topic classification of column headers using a controlled vocabulary, presenting a practical application of LLMs and Large Context Windows within the Semantic Web domain. This approach has the potential to facilitate automated metadata enrichment, thereby enhancing dataset retrieval and the Findability, Accessibility, Interoperability, and Reusability (FAIR) of research data on the Web.

CYApr 15, 2024
Hybrid Intelligence for Digital Humanities

Victor de Boer, Lise Stork

In this paper, we explore the synergies between Digital Humanities (DH) as a discipline and Hybrid Intelligence (HI) as a research paradigm. In DH research, the use of digital methods and specifically that of Artificial Intelligence is subject to a set of requirements and constraints. We argue that these are well-supported by the capabilities and goals of HI. Our contribution includes the identification of five such DH requirements: Successful AI systems need to be able to 1) collaborate with the (human) scholar; 2) support data criticism; 3) support tool criticism; 4) be aware of and cater to various perspectives and 5) support distant and close reading. We take the CARE principles of Hybrid Intelligence (collaborative, adaptive, responsible and explainable) as theoretical framework and map these to the DH requirements. In this mapping, we include example research projects. We finally address how insights from DH can be applied to HI and discuss open challenges for the combination of the two disciplines.