Krzysztof Sierszecki

CYFeb 26

GenAI Integration into Engineering Education: A Case Study of an Introductory Undergraduate Engineering Course

Kadir Kozan, Ozgur Keles, Sihan Jian et al.

GenAI has a potential to enhance the learning and teaching processes in engineering education. For instance, GenAI feedback on students' task performance can be effective depending on when such feedback is provided. However, little is known about how engineering faculty and instructors discover such potential within the scope of their instruction when they try out the technology for the first time. To this end, this study purported to describe an engineering instructor's and seven teaching assistants' initial experiences of integrating GenAI into their undergraduate engineering course and the corresponding changes in students' formative exercise performance. An embedded descriptive single case study design was employed. The corresponding research data included four interviews conducted at the beginning, middle and end of an academic semester, and students' formative exercise performance. Overall, after GenAI integration, students' formative exercise performance increased, and a critical and reflective practice of learning about how to integrate GenAI into instruction provided informative insights. Still, technology integration stayed at the level of replacing other instructional methods or increasing the efficiency of solving coding problems. It turned out to be exciting and surprising for students to be able to use GenAI in course work even though their use of the technology weakened over time. Our findings suggest that engineering teaching staff's initial experimental experiences with GenAI integration can be informative and provide context-specific practical insights. Therefore, it is reasonable for higher education institutions to encourage such experiences especially when there is a lot of unknown regarding an emerging technology.

29.7SEApr 7

CAKE: Cloud Architecture Knowledge Evaluation of Large Language Models

Tim Lukas Adam, Phongsakon Mark Konrad, Riccardo Terrenzi et al.

In today's software architecture, large language models (LLMs) serve as software architecture co-pilots. However, no benchmark currently exists to evaluate large language models' actual understanding of cloud-native software architecture. For this reason we present a benchmark called CAKE, which consists of 188 expert-validated questions covering four cognitive levels of Bloom's revised taxonomy -- recall, analyze, design, and implement -- and five cloud-native topics. Evaluation is conducted on 22 model configurations (0.5B--70B parameters) across four LLM families, using three-run majority voting for multiple-choice questions (MCQs) and LLM-as-a-judge scoring for free-responses (FR). Based on this evaluation, four notable findings were identified. First, MCQ accuracy plateaus above 3B parameters, with the best model reaching 99.2\%. Second, free-response scores scale steadily across all cognitive levels. Third, the two formats capture different facets of knowledge, as the MCQ accuracy approaches a ceiling while free-responses continue to differentiate models. Finally, reasoning augmentation (+think) improves free-response quality, while tool augmentation (+tool) degrades performance for small models. These results suggest that the evaluation format fundamentally shapes how we measure architectural knowledge in LLMs.

Krzysztof Sierszecki

2 Papers