CLAug 8, 2024Code
Dynamic Fog Computing for Enhanced LLM Execution in Medical ApplicationsPhilipp Zagar, Vishnu Ravi, Lauren Aalami et al.
The ability of large language models (LLMs) to transform, interpret, and comprehend vast quantities of heterogeneous data presents a significant opportunity to enhance data-driven care delivery. However, the sensitive nature of protected health information (PHI) raises valid concerns about data privacy and trust in remote LLM platforms. In addition, the cost associated with cloud-based artificial intelligence (AI) services continues to impede widespread adoption. To address these challenges, we propose a shift in the LLM execution environment from opaque, centralized cloud providers to a decentralized and dynamic fog computing architecture. By executing open-weight LLMs in more trusted environments, such as the user's edge device or a fog layer within a local network, we aim to mitigate the privacy, trust, and financial challenges associated with cloud-based LLMs. We further present SpeziLLM, an open-source framework designed to facilitate rapid and seamless leveraging of different LLM execution layers and lowering barriers to LLM integration in digital health applications. We demonstrate SpeziLLM's broad applicability across six digital health applications, showcasing its versatility in various healthcare settings.
21.8CYApr 24
$μ$Ed API: Towards a Shared API for Education MicroservicesMaximillan Sölch, Alexandra Neagu, Marcus Messer et al.
Learning at scale often requires domain-specific automation such as assessment and feedback. An organization locked in to a general learning platform without these specialist automations limits its pedagogical offering. An ecosystem of interoperable, platform-agnostic microservices for domain-specific automation would solve this problem. To develop an effective ecosystem, a standard interface (API) for education microservices is required. We propose an initial specification for a standard, platform-independent API for educational microservices, $μ$Ed. The API integrates functionality from existing systems in use at four institutions, which are adopting the new API. The API is initially specified for automation of feedback, assessment, and educational chatbots, with further service types planned. The API specification provided here enables the development of an ecosystem of education microservices that will facilitate automation in more domains, to more users, providing a richer learning experience in a wide range of disciplines.
AISep 20, 2023
ChatGPT-4 as a Tool for Reviewing Academic Books in SpanishJonnathan Berrezueta-Guzman, Laura Malache-Silva, Stephan Krusche
This study evaluates the potential of ChatGPT-4, an artificial intelligence language model developed by OpenAI, as an editing tool for Spanish literary and academic books. The need for efficient and accessible reviewing and editing processes in the publishing industry has driven the search for automated solutions. ChatGPT-4, being one of the most advanced language models, offers notable capabilities in text comprehension and generation. In this study, the features and capabilities of ChatGPT-4 are analyzed in terms of grammatical correction, stylistic coherence, and linguistic enrichment of texts in Spanish. Tests were conducted with 100 literary and academic texts, where the edits made by ChatGPT-4 were compared to those made by expert human reviewers and editors. The results show that while ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time, it still faces challenges in areas such as context sensitivity, bibliometric analysis, deep contextual understanding, and interaction with visual content like graphs and tables. However, it is observed that collaboration between ChatGPT-4 and human reviewers and editors can be a promising strategy for improving efficiency without compromising quality. Furthermore, the authors consider that ChatGPT-4 represents a valuable tool in the editing process, but its use should be complementary to the work of human editors to ensure high-caliber editing in Spanish literary and academic books.
AIJul 27, 2024
Interactive Learning in Computer Science Education Supported by a Discord ChatbotSantiago Berrezueta-Guzman, Ivan Parmacli, Stephan Krusche et al.
Enhancing interaction and feedback collection in a first-semester computer science course poses a significant challenge due to students' diverse needs and engagement levels. To address this issue, we created and integrated a command-based chatbot on the course communication server on Discord. The DiscordBot enables students to provide feedback on course activities through short surveys, such as exercises, quizzes, and lectures, facilitating stress-free communication with instructors. It also supports attendance tracking and introduces lectures before they start. The research demonstrates the effectiveness of the DiscordBot as a communication tool. The ongoing feedback allowed course instructors to dynamically adjust and improve the difficulty level of upcoming activities and promote discussion in subsequent tutor sessions. The data collected reveal that students can accurately perceive the activities' difficulty and expected results, providing insights not possible through traditional end-of-semester surveys. Students reported that interaction with the DiscordBot was easy and expressed a desire to continue using it in future semesters. This responsive approach ensures the course meets the evolving needs of students, thereby enhancing their overall learning experience.
SEApr 3, 2024
AI-Tutoring in Software Engineering EducationEduard Frankford, Clemens Sauerwein, Patrick Bassner et al.
With the rapid advancement of artificial intelligence (AI) in various domains, the education sector is set for transformation. The potential of AI-driven tools in enhancing the learning experience, especially in programming, is immense. However, the scientific evaluation of Large Language Models (LLMs) used in Automated Programming Assessment Systems (APASs) as an AI-Tutor remains largely unexplored. Therefore, there is a need to understand how students interact with such AI-Tutors and to analyze their experiences. In this paper, we conducted an exploratory case study by integrating the GPT-3.5-Turbo model as an AI-Tutor within the APAS Artemis. Through a combination of empirical data collection and an exploratory survey, we identified different user types based on their interaction patterns with the AI-Tutor. Additionally, the findings highlight advantages, such as timely feedback and scalability. However, challenges like generic responses and students' concerns about a learning progress inhibition when using the AI-Tutor were also evident. This research adds to the discourse on AI's role in education.
CYMay 28, 2025
From Coders to Critics: Empowering Students through Peer Assessment in the Age of AI CopilotsSantiago Berrezueta-Guzman, Stephan Krusche, Stefan Wagner
The rapid adoption of AI powered coding assistants like ChatGPT and other coding copilots is transforming programming education, raising questions about assessment practices, academic integrity, and skill development. As educators seek alternatives to traditional grading methods susceptible to AI enabled plagiarism, structured peer assessment could be a promising strategy. This paper presents an empirical study of a rubric based, anonymized peer review process implemented in a large introductory programming course. Students evaluated each other's final projects (2D game), and their assessments were compared to instructor grades using correlation, mean absolute error, and root mean square error (RMSE). Additionally, reflective surveys from 47 teams captured student perceptions of fairness, grading behavior, and preferences regarding grade aggregation. Results show that peer review can approximate instructor evaluation with moderate accuracy and foster student engagement, evaluative thinking, and interest in providing good feedback to their peers. We discuss these findings for designing scalable, trustworthy peer assessment systems to face the age of AI assisted coding.
LGFeb 6, 2025
vCache: Verified Semantic Prompt CachingLuis Gaspar Schroeder, Aditya Desai, Alejandro Cuadron et al.
Semantic caches return cached responses for semantically similar prompts to reduce LLM inference latency and cost. They embed cached prompts and store them alongside their response in a vector database. Embedding similarity metrics assign a numerical score to quantify the similarity between a request and its nearest neighbor prompt from the cache. Existing systems use the same static similarity threshold across all requests to determine whether two prompts can share similar responses. However, we observe that static thresholds do not give formal correctness guarantees, can result in unexpected error rates, and lead to suboptimal cache hit rates. This paper proposes vCache, the first verified semantic cache with user-defined error rate guarantees. It employs an online learning algorithm to estimate an optimal threshold for each cached prompt, enabling reliable cache responses without additional training. Our experiments show that vCache consistently meets the specified error bounds while outperforming state-of-the-art static-threshold and fine-tuned embedding baselines. We release the vCache implementation and three benchmarks to support future research.
AIJun 21, 2024
Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment ParadigmsSantiago Berrezueta-Guzman, Mohanad Kandil, María-Luisa Martín-Ruiz et al.
Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental condition characterized by inattention, hyperactivity, and impulsivity, which can significantly impact an individual's daily functioning and quality of life. Occupational therapy plays a crucial role in managing ADHD by fostering the development of skills needed for daily living and enhancing an individual's ability to participate fully in school, home, and social situations. Recent studies highlight the potential of integrating Large Language Models (LLMs) like ChatGPT and Socially Assistive Robots (SAR) to improve psychological treatments. This integration aims to overcome existing limitations in mental health therapy by providing tailored support and adapting to the unique needs of this sensitive group. However, there remains a significant gap in research exploring the combined use of these advanced technologies in ADHD therapy, suggesting an opportunity for novel therapeutic approaches. Thus, we integrated two advanced language models, ChatGPT-4 Turbo and Claude-3 Opus, into a robotic assistant to explore how well each model performs in robot-assisted interactions. Additionally, we have compared their performance in a simulated therapy scenario to gauge their effectiveness against a clinically validated customized model. The results of this study show that ChatGPT-4 Turbo excelled in performance and responsiveness, making it suitable for time-sensitive applications. Claude-3 Opus, on the other hand, showed strengths in understanding, coherence, and ethical considerations, prioritizing safe and engaging interactions. Both models demonstrated innovation and adaptability, but ChatGPT-4 Turbo offered greater ease of integration and broader language support. The selection between them hinges on the specific demands of ADHD therapy.
HCMay 9, 2024
Iris: An AI-Driven Virtual Tutor For Computer Science EducationPatrick Bassner, Eduard Frankford, Stephan Krusche
Integrating AI-driven tools in higher education is an emerging area with transformative potential. This paper introduces Iris, a chat-based virtual tutor integrated into the interactive learning platform Artemis that offers personalized, context-aware assistance in large-scale educational settings. Iris supports computer science students by guiding them through programming exercises and is designed to act as a tutor in a didactically meaningful way. Its calibrated assistance avoids revealing complete solutions, offering subtle hints or counter-questions to foster independent problem-solving skills. For each question, it issues multiple prompts in a Chain-of-Thought to GPT-3.5-Turbo. The prompts include a tutor role description and examples of meaningful answers through few-shot learning. Iris employs contextual awareness by accessing the problem statement, student code, and automated feedback to provide tailored advice. An empirical evaluation shows that students perceive Iris as effective because it understands their questions, provides relevant support, and contributes to the learning process. While students consider Iris a valuable tool for programming exercises and homework, they also feel confident solving programming tasks in computer-based exams without Iris. The findings underscore students' appreciation for Iris' immediate and personalized support, though students predominantly view it as a complement to, rather than a replacement for, human tutors. Nevertheless, Iris creates a space for students to ask questions without being judged by others.
HCOct 28, 2021
An Analysis of Programming Course Evaluations Before and After the Introduction of an AutograderGerhard Johann Hagerer, Laura Lahesoo, Miriam Anschütz et al.
Commonly, introductory programming courses in higher education institutions have hundreds of participating students eager to learn to program. The manual effort for reviewing the submitted source code and for providing feedback can no longer be managed. Manually reviewing the submitted homework can be subjective and unfair, particularly if many tutors are responsible for grading. Different autograders can help in this situation; however, there is a lack of knowledge about how autograders can impact students' overall perception of programming classes and teaching. This is relevant for course organizers and institutions to keep their programming courses attractive while coping with increasing students. This paper studies the answers to the standardized university evaluation questionnaires of multiple large-scale foundational computer science courses which recently introduced autograding. The differences before and after this intervention are analyzed. By incorporating additional observations, we hypothesize how the autograder might have contributed to the significant changes in the data, such as, improved interactions between tutors and students, improved overall course quality, improved learning success, increased time spent, and reduced difficulty. This qualitative study aims to provide hypotheses for future research to define and conduct quantitative surveys and data analysis. The autograder technology can be validated as a teaching method to improve student satisfaction with programming courses.
SESep 23, 2021
What Makes Agile Software Development Agile?Marco Kuhrmann, Paolo Tell, Regina Hebig et al.
Together with many success stories, promises such as the increase in production speed and the improvement in stakeholders' collaboration have contributed to making agile a transformation in the software industry in which many companies want to take part. However, driven either by a natural and expected evolution or by contextual factors that challenge the adoption of agile methods as prescribed by their creator(s), software processes in practice mutate into hybrids over time. Are these still agile? In this article, we investigate the question: what makes a software development method agile? We present an empirical study grounded in a large-scale international survey that aims to identify software development methods and practices that improve or tame agility. Based on 556 data points, we analyze the perceived degree of agility in the implementation of standard project disciplines and its relation to used development methods and practices. Our findings suggest that only a small number of participants operate their projects in a purely traditional or agile manner (under 15%). That said, most project disciplines and most practices show a clear trend towards increasing degrees of agility. Compared to the methods used to develop software, the selection of practices has a stronger effect on the degree of agility of a given discipline. Finally, there are no methods or practices that explicitly guarantee or prevent agility. We conclude that agility cannot be defined solely at the process level. Additional factors need to be taken into account when trying to implement or improve agility in a software company. Finally, we discuss the field of software process-related research in the light of our findings and present a roadmap for future research.
SEJan 29, 2021
Catching up with Method and Process Practice: An Industry-Informed Baseline for ResearchersJil Klünder, Regina Hebig, Paolo Tell et al.
Software development methods are usually not applied by the book. Companies are under pressure to continuously deploy software products that meet market needs and stakeholders' requests. To implement efficient and effective development processes, companies utilize multiple frameworks, methods and practices, and combine these into hybrid methods. A common combination contains a rich management framework to organize and steer projects complemented with a number of smaller practices providing the development teams with tools to complete their tasks. In this paper, based on 732 data points collected through an international survey, we study the software development process use in practice. Our results show that 76.8% of the companies implement hybrid methods. Company size as well as the strategy in devising and evolving hybrid methods affect the suitability of the chosen process to reach company or project goals. Our findings show that companies that combine planned improvement programs with process evolution can increase their process' suitability by up to 5%.