Katharina Simbeck

CL
h-index5
6papers
13citations
Novelty34%
AI Score39

6 Papers

CYJun 13, 2023
Show me the numbers! -- Student-facing Interventions in Adaptive Learning Environments for German Spelling

Nathalie Rzepka, Katharina Simbeck, Hans-Georg Mueller et al.

Since adaptive learning comes in many shapes and sizes, it is crucial to find out which adaptations can be meaningful for which areas of learning. Our work presents the result of an experiment conducted on an online platform for the acquisition of German spelling skills. We compared the traditional online learning platform to three different adaptive versions of the platform that implement machine learning-based student-facing interventions that show the personalized solution probability. We evaluate the different interventions with regard to the error rate, the number of early dropouts, and the users competency. Our results show that the number of mistakes decreased in comparison to the control group. Additionally, an increasing number of dropouts was found. We did not find any significant effects on the users competency. We conclude that student-facing adaptive learning environments are effective in improving a persons error rate and should be chosen wisely to have a motivating impact.

CYOct 29, 2024
Assessing the Auditability of AI-integrating Systems: A Framework and Learning Analytics Case Study

Linda Fernsel, Yannick Kalff, Katharina Simbeck

Audits contribute to the trustworthiness of Learning Analytics (LA) systems that integrate Artificial Intelligence (AI) and may be legally required in the future. We argue that the efficacy of an audit depends on the auditability of the audited system. Therefore, systems need to be designed with auditability in mind. We present a framework for assessing the auditability of AI-integrating systems that consists of three parts: (1) Verifiable claims about the validity, utility and ethics of the system, (2) Evidence on subjects (data, models or the system) in different types (documentation, raw sources and logs) to back or refute claims, (3) Evidence must be accessible to auditors via technical means (APIs, monitoring tools, explainable AI, etc.). We apply the framework to assess the auditability of Moodle's dropout prediction system and a prototype AI-based LA. We find that Moodle's auditability is limited by incomplete documentation, insufficient monitoring capabilities and a lack of available test data. The framework supports assessing the auditability of AI-based LA systems in use and improves the design of auditable systems and thus of audits.

CLSep 22, 2025
Investigating Bias: A Multilingual Pipeline for Generating, Solving, and Evaluating Math Problems with LLMs

Mariam Mahran, Katharina Simbeck

Large Language Models (LLMs) are increasingly used for educational support, yet their response quality varies depending on the language of interaction. This paper presents an automated multilingual pipeline for generating, solving, and evaluating math problems aligned with the German K-10 curriculum. We generated 628 math exercises and translated them into English, German, and Arabic. Three commercial LLMs (GPT-4o-mini, Gemini 2.5 Flash, and Qwen-plus) were prompted to produce step-by-step solutions in each language. A held-out panel of LLM judges, including Claude 3.5 Haiku, evaluated solution quality using a comparative framework. Results show a consistent gap, with English solutions consistently rated highest, and Arabic often ranked lower. These findings highlight persistent linguistic bias and the need for more equitable multilingual AI systems in education.

LGSep 22, 2025
Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models

Katharina Simbeck, Mariam Mahran

Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.

CLSep 24, 2025
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models

Mariam Mahran, Katharina Simbeck

Large Language Models (LLMs) are trained on massive, unstructured corpora, making it unclear which social patterns and biases they absorb and later reproduce. Existing evaluations typically examine outputs or activations, but rarely connect them back to the pre-training data. We introduce a pipeline that couples LLMs with sparse autoencoders (SAEs) to trace how different themes are encoded during training. As a controlled case study, we trained a GPT-style model on 37 nineteenth-century novels by ten female authors, a corpus centered on themes such as gender, marriage, class, and morality. By applying SAEs across layers and probing with eleven social and moral categories, we mapped sparse features to human-interpretable concepts. The analysis revealed stable thematic backbones (most prominently around gender and kinship) and showed how associations expand and entangle with depth. More broadly, we argue that the LLM+SAEs pipeline offers a scalable framework for auditing how cultural assumptions from the data are embedded in model representations.

HCSep 8, 2025
Explained, yet misunderstood: How AI Literacy shapes HR Managers' interpretation of User Interfaces in Recruiting Recommender Systems

Yannick Kalff, Katharina Simbeck

AI-based recommender systems increasingly influence recruitment decisions. Thus, transparency and responsible adoption in Human Resource Management (HRM) are critical. This study examines how HR managers' AI literacy influences their subjective perception and objective understanding of explainable AI (XAI) elements in recruiting recommender dashboards. In an online experiment, 410 German-based HR managers compared baseline dashboards to versions enriched with three XAI styles: important features, counterfactuals, and model criteria. Our results show that the dashboards used in practice do not explain AI results and even keep AI elements opaque. However, while adding XAI features improves subjective perceptions of helpfulness and trust among users with moderate or high AI literacy, it does not increase their objective understanding. It may even reduce accurate understanding, especially with complex explanations. Only overlays of important features significantly aided the interpretations of high-literacy users. Our findings highlight that the benefits of XAI in recruitment depend on users' AI literacy, emphasizing the need for tailored explanation strategies and targeted literacy training in HRM to ensure fair, transparent, and effective adoption of AI.