SEApr 22
Quo Vadis, Code Review? Exploring the Future of Code ReviewMichael Dorner, Andreas Bauer, Darja Šmite et al.
Context: Code review has long been a core practice in collaborative software engineering. As automation becomes increasingly embedded in development workflows, the role and functioning of code review are subject to change. Objective: This study explores how professional developers anticipate the evolution of code review and identifies emerging tensions reflected in these expectations. Method: We conducted a cross-sectional survey with 100 developers across five software-driven companies. The survey captured estimates of current review time and reviewed artifacts, as well as anticipated changes over a five-year horizon. Open-ended questions invited reflections on the future of code review. Quantitative responses were analyzed descriptively, and open-ended responses were independently coded by multiple researchers using thematic analysis to identify recurring patterns in participant responses. Results: Practitioners expect code review to remain essential, anticipating stable or increased time investment and a broader range of reviewed artifacts over the next five years. In open-ended responses, many participants explicitly referenced AI and large language models (LLMs), describing increasing automation in both code authoring and reviewing, including scenarios in which automated systems operate in both roles. Conclusion: Our analysis suggests emerging tensions concerning understanding, accountability, and trust in automation-mediated code review. These tensions provide early empirical signals of socio-technical challenges and position code review as a concrete setting for examining the implications of LLM integration in collaborative software engineering.
CVDec 25, 2023
Comparative Analysis of Radiomic Features and Gene Expression Profiles in Histopathology Data Using Graph Neural NetworksLuis Carlos Rivera Monroy, Leonhard Rist, Martin Eberhardt et al.
This study leverages graph neural networks to integrate MELC data with Radiomic-extracted features for melanoma classification, focusing on cell-wise analysis. It assesses the effectiveness of gene expression profiles and Radiomic features, revealing that Radiomic features, particularly when combined with UMAP for dimensionality reduction, significantly enhance classification performance. Notably, using Radiomics contributes to increased diagnostic accuracy and computational efficiency, as it allows for the extraction of critical data from fewer stains, thereby reducing operational costs. This methodology marks an advancement in computational dermatology for melanoma cell classification, setting the stage for future research and potential developments.
SEOct 29, 2025
Reflections on the Reproducibility of Commercial LLM Performance in Empirical Software Engineering StudiesFlorian Angermeir, Maximilian Amougou, Mark Kreitz et al.
Large Language Models have gained remarkable interest in industry and academia. The increasing interest in LLMs in academia is also reflected in the number of publications on this topic over the last years. For instance, alone 78 of the around 425 publications at ICSE 2024 performed experiments with LLMs. Conducting empirical studies with LLMs remains challenging and raises questions on how to achieve reproducible results, for both researchers and practitioners. One important step towards excelling in empirical research on LLM and their application is to first understand to what extent current research results are eventually reproducible and what factors may impede reproducibility. This investigation is within the scope of our work. We contribute an analysis of the reproducibility of LLM-centric studies, provide insights into the factors impeding reproducibility, and discuss suggestions on how to improve the current state. In particular, we studied the 85 articles describing LLM-centric studies, published at ICSE 2024 and ASE 2024. Of the 85 articles, 18 provided research artefacts and used OpenAI models. We attempted to replicate those 18 studies. Of the 18 studies, only five were sufficiently complete and executable. For none of the five studies, we were able to fully reproduce the results. Two studies seemed to be partially reproducible, and three studies did not seem to be reproducible. Our results highlight not only the need for stricter research artefact evaluations but also for more robust study designs to ensure the reproducible value of future publications.
SEJun 8, 2014
Platform-Centric Android Monitoring---Modular and EfficientJan-Christoph Kuester, Andreas Bauer
We present an add-on for the Android platform, capable of intercepting nearly all interactions between apps or apps with the platform, including arguments of method invocations in a human-readable format. A preliminary performance evaluation shows that the performance penalty of our solution is roughly comparable with similar tools in that area. The advantage of our solution, however, is that it is truly modular in the sense that we do not actually modify the Android platform itself, and can include it even with an already running system. Possible uses of such an add-on are manifold; we discuss one from the area of runtime verification that aims at improving system security.
IRApr 16, 2012
Event based classification of Web 2.0 text streamsAndreas Bauer, Christian Wolff
Web 2.0 applications like Twitter or Facebook create a continuous stream of information. This demands new ways of analysis in order to offer insight into this stream right at the moment of the creation of the information, because lots of this data is only relevant within a short period of time. To address this problem real time search engines have recently received increased attention. They take into account the continuous flow of information differently than traditional web search by incorporating temporal and social features, that describe the context of the information during its creation. Standard approaches where data first get stored and then is processed from a peristent storage suffer from latency. We want to address the fluent and rapid nature of text stream by providing an event based approach that analyses directly the stream of information. In a first step we want to define the difference between real time search and traditional search to clarify the demands in modern text filtering. In a second step we want to show how event based features can be used to support the tasks of real time search engines. Using the example of Twitter we present in this paper a way how to combine an event based approach with text mining and information filtering concepts in order to classify incoming information based on stream features. We calculate stream dependant features and feed them into a neural network in order to classify the text streams. We show the separative capabilities of event based features as the foundation for a real time search engine.