Peter Mandl

h-index9

7papers

468citations

Novelty25%

AI Score34

Ranked #115,557 of 194,257 authors (top 59%)#25,397 in LG (top 63%)

7 Papers

6.7SEApr 22Code

A Ground-Truth-Based Evaluation of Vulnerability Detection Across Multiple Ecosystems

Peter Mandl, Paul Mandl, Martin Häusl et al.

Automated vulnerability detection tools are widely used to identify security vulnerabilities in software dependencies. However, the evaluation of such tools remains challenging due to the heterogeneous structure of vulnerability data sources, inconsistent identifier schemes, and ambiguities in version range specifications. In this paper, we present an empirical evaluation of vulnerability detection across multiple software ecosystems using a curated ground-truth dataset derived from the Open Source Vulnerabilities (OSV) database. The dataset explicitly maps vulnerabilities to concrete package versions and enables a systematic comparison of detection results across different tools and services. Since vulnerability databases such as OSV are continuously updated, the dataset used in this study represents a snapshot of the vulnerability landscape at the time of the evaluation. To support reproducibility and future studies, we provide an open-source tool that automatically reconstructs the dataset from the current OSV database using the methodology described in this paper. Our evaluation highlights systematic differences between vulnerability detection systems and demonstrates the importance of transparent dataset construction for reproducible empirical security research.

7.3AIJul 16

Contextualized Early Detection of Online Firestorms: A Sequential LLM-Based Approach

Besim Shala, Peter Mandl, Andreas Humpe et al.

Online firestorms are rapid collective escalations of highly negative user-generated content and may cause substantial reputational and economic damage. Existing detectors usually work with volume signals, sentiment scores, or predefined linguistic features. Such signals are useful, but they capture contextual meaning shifts in evolving discussion threads only indirectly. This paper proposes an LLM-based detection system with two operating modes. The first mode classifies complete Reddit threads retrospectively by combining local chunk-level assessments into a thread-level judgment. The second mode processes threads sequentially and issues early warnings when a sliding window exceeds calibrated thresholds. In this mode, the language model estimates three firestorm indicators: negativity share, escalation level, and contributor count. On a balanced Reddit dataset, the global mode achieves strong classification performance, while the early warning mode reaches high recall and detects escalating threads after only a small number of comments and distinct contributors. The results indicate that LLMs can be used not only for static judgment tasks, but also as repeated estimators in context-aware monitoring of social media discourse.

14.8SEJun 13

AI-driven Software Development: A Pragmatic Path to Agentic Development Processes

Peter Mandl, Paul Mandl

Generative AI is transforming software development from localized tool support into development work that is embedded in processes, tools, and organizational structures. Its use now extends beyond code completion to requirements, architecture, implementation, testing, review, operations, and maintenance. Existing research shows a differentiated picture. Productivity gains are possible, but depend on task type, codebase characteristics, and developers' experience. At the same time, AI-generated artifacts require additional control and governance. Building on these observations, this paper develops a pragmatic organizing framework for the transition toward AI-driven Software Development. It describes a progression from informal and assistive AI use through integrated AI workflows toward controlled agentic development processes. The focus is not on individual tools or models, but on the technical, organizational, and quality-assurance mechanisms needed to embed AI across central software engineering activities. Particular importance is assigned to a harness that connects project context, tool access, verification, permissions, logging, and human approval. The paper draws on current research, practice-oriented sources, established software engineering practices, and project experience. A mid-sized software company is used as an exploratory case study to assess the plausibility of the framework and to illustrate how prerequisites, governance requirements, design practices, and transformation paths can be shaped in a concrete organizational context. The paper provides a conceptual basis for further scholarly discussion and empirical investigation of AI-driven Software Development.

6.7CLJan 2, 2025

Digital Guardians: Can GPT-4, Perspective API, and Moderation API reliably detect hate speech in reader comments of German online newspapers?

Manuel Weber, Moritz Huber, Maximilian Auch et al.

In recent years, toxic content and hate speech have become widespread phenomena on the internet. Moderators of online newspapers and forums are now required, partly due to legal regulations, to carefully review and, if necessary, delete reader comments. This is a labor-intensive process. Some providers of large language models already offer solutions for automated hate speech detection or the identification of toxic content. These include GPT-4o from OpenAI, Jigsaw's (Google) Perspective API, and OpenAI's Moderation API. Based on the selected German test dataset HOCON34k, which was specifically created for developing tools to detect hate speech in reader comments of online newspapers, these solutions are compared with each other and against the HOCON34k baseline. The test dataset contains 1,592 annotated text samples. For GPT-4o, three different promptings are used, employing a Zero-Shot, One-Shot, and Few-Shot approach. The results of the experiments demonstrate that GPT-4o outperforms both the Perspective API and the Moderation API, and exceeds the HOCON34k baseline by approximately 5 percentage points, as measured by a combined metric of MCC and F2-score.

2.6LGOct 25, 2024

EnergyPlus Room Simulator

Manuel Weber, Philipp Bogdain, Sophia Viktoria Weißenberger et al.

Research towards energy optimization in buildings heavily relies on building-related data such as measured indoor climate factors. While data collection is a labor- and cost-intensive task, simulations are a cheap alternative to generate datasets of arbitrary sizes, particularly useful for data-intensive deep learning methods. In this paper, we present the tool EnergyPlus Room Simulator, which enables the simulation of indoor climate in a specific room of a building using the simulation software EnergyPlus. It allows to alter room models and simulate various factors such as temperature, humidity, and CO2 concentration. In contrast to manually working with EnergyPlus, this tool enhances the simulation process by offering a convenient interface, including a user-friendly graphical user interface (GUI) as well as a REST API. The tool is intended to support scientific, building-related tasks such as occupancy detection on a room level by facilitating fast access to simulation data that may, for instance, be used for pre-training machine learning models.

4.0IRJun 6, 2024

Innovations in Cover Song Detection: A Lyrics-Based Approach

Maximilian Balluff, Peter Mandl, Christian Wolff

Cover songs are alternate versions of a song by a different artist. Long being a vital part of the music industry, cover songs significantly influence music culture and are commonly heard in public venues. The rise of online music platforms has further increased their prevalence, often as background music or video soundtracks. While current automatic identification methods serve adequately for original songs, they are less effective with cover songs, primarily because cover versions often significantly deviate from the original compositions. In this paper, we propose a novel method for cover song detection that utilizes the lyrics of a song. We introduce a new dataset for cover songs and their corresponding originals. The dataset contains 5078 cover songs and 2828 original songs. In contrast to other cover song datasets, it contains the annotated lyrics for the original song and the cover song. We evaluate our method on this dataset and compare it with multiple baseline approaches. Our results show that our method outperforms the baseline approaches.

4.2LGOct 8, 2020

Towards the Detection of Building Occupancy with Synthetic Environmental Data

Manuel Weber, Christoph Doblander, Peter Mandl

Information about room-level occupancy is crucial to many building-related tasks, such as building automation or energy performance simulation. Current occupancy detection literature focuses on data-driven methods, but is mostly based on small case studies with few rooms. The necessity to collect room-specific data for each room of interest impedes applicability of machine learning, especially data-intensive deep learning approaches, in practice. To derive accurate predictions from less data, we suggest knowledge transfer from synthetic data. In this paper, we conduct an experiment with data from a CO$_2$ sensor in an office room, and additional synthetic data obtained from a simulation. Our contribution includes (a) a simulation method for CO$_2$ dynamics under randomized occupant behavior, (b) a proof of concept for knowledge transfer from simulated CO$_2$ data, and (c) an outline of future research implications. From our results, we can conclude that the transfer approach can effectively reduce the required amount of data for model training.