Tim Wittenborg

DL
h-index16
5papers
7citations
Novelty40%
AI Score43

5 Papers

DLJun 1
Speaker Mining -- FAIR Data on Public Broadcasts for Question Answering

Tim Wittenborg, Omar Imad Remmo, Claudia Frick et al.

Public broadcasts are at the center of civic discourse: Traditional television talk shows, alongside emerging podcast and web video formats, capture and guide the attention of our societies, shaping how citizens encounter politics, science, and societal issues. Yet, systematic or even simple analyses of these formats face similar challenges: guest and content metadata are scarce, fleeting, fragmented, and not standardized. Research conducted and questions answered are based on extensive, laborious, yet isolated data-curation efforts that capture only a fraction of the relevant landscape. This work seeks to address this issue using a scaling-oriented framework for FAIR data curation in public broadcasting. Evaluated on 15 broadcasting programs, the pipeline aggregates ZDF Archive PDFs, fernsehserien.de, and Wikidata into a unified knowledge graph. Of the 31,817 candidate guest mentions from these three sources, 17,729 could be automatically disambiguated, further 5,958 via 64 hours of manual reconciling using OpenRefine. Results are published at speakermining.wikibase.cloud and linked to Wikidata, enabling SPARQL-based question answering based on gender, age, occupation, or institutional affiliation across 8,436 canonical persons with 23,527 appearances in 6,469 aligned episodes. Our iterative experience reveals that correctly disambiguating and deduplicating speaker data from heterogeneous sources demands dedicated effort on sustainable infrastructure. For scalable and reliable question answering on public broadcasts to be accessible to everyone, we recommend fostering the potential of linked open data: Advancing alignment and utilization approaches like this work, particularly towards crowdsourced development and curation, but also more FAIR data interfaces from public broadcast service providers.

DLJul 18, 2025
ExtracTable: Human-in-the-Loop Transformation of Scientific Corpora into Structured Knowledge

Lena John, Ahmed Malek Ghanmi, Tim Wittenborg et al.

As the volume of scientific literature grows, efficient knowledge organization becomes increasingly challenging. Traditional approaches to structuring scientific content are time-consuming and require significant domain expertise, highlighting the need for tool support. We present ExtracTable, a Human-in-the-Loop (HITL) workflow and framework that assists researchers in transforming unstructured publications into structured representations. The workflow combines large language models (LLMs) with user-defined schemas and is designed for downstream integration into knowledge graphs (KGs). Developed and evaluated in the context of the Open Research Knowledge Graph (ORKG), ExtracTable automates key steps such as document preprocessing and data extraction while ensuring user oversight through validation. In an evaluation with ORKG community participants following the Quality Improvement Paradigm (QIP), ExtracTable demonstrated high usability and practical value. Participants gave it an average System Usability Scale (SUS) score of 84.17 (A+, the highest rating). The time to progress from a research interest to literature-based insights was reduced from between 4 hours and 2 weeks to an average of 24:40 minutes. By streamlining corpus creation and structured data extraction for knowledge graph integration, ExtracTable leverages LLMs and user models to accelerate literature reviews. However, human validation remains essential to ensure quality, and future work will address improving extraction accuracy and entity linking to existing knowledge resources.

DLMay 12, 2025Code
SciCom Wiki: Fact-Checking and FAIR Knowledge Distribution for Scientific Videos and Podcasts

Tim Wittenborg, Constantin Sebastian Tremel, Niklas Stehr et al.

Democratic societies need accessible, reliable information. Videos and Podcasts have established themselves as the medium of choice for civic dissemination, but also as carriers of misinformation. The emerging Science Communication Knowledge Infrastructure (SciCom KI) curating non-textual media is still fragmented and not adequately equipped to scale against the content flood. Our work sets out to support the SciCom KI with a central, collaborative platform, the SciCom Wiki, to facilitate FAIR (findable, accessible, interoperable, reusable) media representation and the fact-checking of their content, particularly for videos and podcasts. Building an open-source service system centered around Wikibase, we survey requirements from 53 stakeholders, refine these in 11 interviews, and evaluate our prototype based on these requirements with another 14 participants. To address the most requested feature, fact-checking, we developed a neurosymbolic computational fact-checking approach, converting heterogenous media into knowledge graphs. This increases machine-readability and allows comparing statements against equally represented ground-truth. Our computational fact-checking tool was iteratively evaluated through 10 expert interviews, a public user survey with 43 participants verified the necessity and usability of our tool. Overall, our findings identified several needs to systematically support the SciCom KI. The SciCom Wiki, as a FAIR digital library complementing our neurosymbolic computational fact-checking framework, was found suitable to address the raised requirements. Further, we identified that the SciCom KI is severely underdeveloped regarding FAIR knowledge and related systems facilitating its collaborative creation and curation. Our system can provide a central knowledge node, yet a collaborative effort is required to scale against the imminent (mis-)information flood.

DLMar 6
Fostering Knowledge Infrastructures in Science Communication and Aerospace Engineering

Tim Wittenborg

Knowledge infrastructures are defined as robust networks of people, artifacts, and institutions that generate, share and maintain specific knowledge. Yet, many domains are fragmented and far from robustly networked, such as science communication or aerospace engineering. While FAIR (Findable, Accessible, Interoperable, Reusable) data management tools exist, their adoption in these domains is limited. Several challenges inhibit this adoption, from complex heterogeneous data formats to lack of structured support to outright incentives against collaboration or legal barriers. This doctoral work outlines how to foster underdeveloped knowledge infrastructures with the use-cases of science communication and aerospace engineering. By analyzing these problems and identifying available solutions, tool-supported workflows towards collaborative infrastructure can be implemented and evaluated. These include human-in-the-loop artificial intelligence (AI)-supported workflows for information extraction and processing, wiki- and knowledge-graph-based digital libraries, and stakeholder-requirement-driven interfaces. While these developed tools for workflow automation and knowledge representation show promise, significant challenges remain. Future work will have to go beyond technical problem-solving and address the societal and legal barriers to unlock the particular domains. Beyond that, advocates of emerging knowledge infrastructures in any domain are welcome to apply the findings of this work to foster the networking of available knowledge.

CLMay 12, 2025
Computational Fact-Checking of Online Discourse: Scoring scientific accuracy in climate change related news articles

Tim Wittenborg, Constantin Sebastian Tremel, Markus Stocker et al.

Democratic societies need reliable information. Misinformation in popular media such as news articles or videos threatens to impair civic discourse. Citizens are, unfortunately, not equipped to verify this content flood consumed daily at increasing rates. This work aims to semi-automatically quantify scientific accuracy of online media. By semantifying media of unknown veracity, their statements can be compared against equally processed trusted sources. We implemented a workflow using LLM-based statement extraction and knowledge graph analysis. Our neurosymbolic system was able to evidently streamline state-of-the-art veracity quantification. Evaluated via expert interviews and a user survey, the tool provides a beneficial veracity indication. This indicator, however, is unable to annotate public media at the required granularity and scale. Further work towards a FAIR (Findable, Accessible, Interoperable, Reusable) ground truth and complementary metrics are required to scientifically support civic discourse.