David Bermbach

DC
15papers
62citations
Novelty42%
AI Score53

15 Papers

7.8DBMay 20
Towards Serverless Processing of Spatiotemporal Big Data Queries

Diana Baumann, Tim C. Rese, David Bermbach

Spatiotemporal data are being produced in continuously growing volumes by a variety of data sources and a variety of application fields rely on rapid analysis of such data. Existing systems such as PostGIS or MobilityDB usually build on relational database systems, thus, inheriting their scale-out characteristics. As a consequence, big spatiotemporal data scenarios still have limited support even though many query types can easily be parallelized. In this paper, we propose our vision of a native serverless data processing approach for spatiotemporal data: We break down queries into small subqueries which then leverage the near-instant scaling of Function-as-a-Service platforms to execute them in parallel. With this, we partially solve the scalability needs of big spatiotemporal data processing.

35.5DBMar 10Code
GeoBenchr: An Application-Centric Benchmarking Suite for Spatiotemporal Database Platforms

Tim C. Rese, Nils Japke, Diana Baumann et al.

The rapid growth of spatiotemporal data volumes needs to be handled by database systems capable of efficiently managing and querying such data. Existing systems such as PostGIS, SpaceTime, and MobilityDB offer partial solutions but differ widely in scope and performance. Also, first spatiotemporal benchmarks provide valuable insights but are limited in scope and, to our knowledge, no application-centric benchmarking suite exists. In this paper, we propose GeoBenchr, an open-source, application-centric benchmarking suite for spatiotemporal platforms. GeoBenchr enables comprehensive evaluation across diverse datasets, query types, and workload patterns, reflecting realistic use cases from domains such as cycling, aviation, and maritime tracking. We use our GeoBenchr prototype to evaluate several system aspects including scalability, configuration impact, and cross-platform performance comparison. Our results highlight the importance of application-centric benchmarking in selecting suitable spatiotemporal database systems for real-world scenarios.

LGApr 21, 2022
CycleSense: Detecting Near Miss Incidents in Bicycle Traffic from Mobile Motion Sensors

Ahmet-Serdar Karakaya, Thomas Ritter, Felix Biessmann et al.

In cities worldwide, cars cause health and traffic problems whichcould be partly mitigated through an increased modal share of bicycles. Many people, however, avoid cycling due to a lack of perceived safety. For city planners, addressing this is hard as they lack insights intowhere cyclists feel safe and where they do not. To gain such insights,we have in previous work proposed the crowdsourcing platform SimRa,which allows cyclists to record their rides and report near miss incidentsvia a smartphone app. In this paper, we present CycleSense, a combination of signal pro-cessing and Machine Learning techniques, which partially automatesthe detection of near miss incidents, thus making the reporting of nearmiss incidents easier. Using the SimRa data set, we evaluate CycleSenseby comparing it to a baseline method used by SimRa and show that itsignificantly improves incident detection.

8.3DCApr 17
New Kids: An Architecture and Performance Investigation of Second-Generation Serverless Platforms

Trever Schirmer, Aris Wiegand, Lucca di Benedetto et al.

With the ever-increasing usage of serverless computing in both industry and academia, it is essential to understand the mechanisms that power the underlying platforms. As serverless is more than ten years old, there are different platforms with vastly different approaches. We show that, next to the traditional and popular platforms, a second generation of serverless platform has emerged. While first-generation platforms are based on containerized, centralized execution, the new generation leverages lightweight isolates and edge deployment. This evolution reduces warm request latency from approximately 40 ms to around 10 ms and reduces cold starts to an afterthought, but limits the execution environment. In this paper, we gather and analyze all publicly available information to provide detailed insights into the underlying architecture of seven platforms and then run a microbenchmark-based evaluation totaling more than 38 million function calls to gain a deeper understanding their performance.

54.3DCMar 26
Revealing the influence of participant failures on model quality in cross-silo Federated Learning

Fabian Stricker, David Bermbach, Christian Zirpins

Federated Learning (FL) is a paradigm for training machine learning (ML) models in collaborative settings while preserving participants' privacy by keeping raw data local. A key requirement for the use of FL in production is reliability, as insufficient reliability can compromise the validity, stability, and reproducibility of learning outcomes. FL inherently operates as a distributed system and is therefore susceptible to crash failures, network partitioning, and other fault scenarios. Despite this, the impact of such failures on FL outcomes has not yet been studied systematically. In this paper, we address this gap by investigating the impact of missing participants in FL. To this end, we conduct extensive experiments on image, tabular, and time-series data and analyze how the absence of participants affects model performance, taking into account influencing factors such as data skewness, different availability patterns, and model architectures. Furthermore, we examine scenario-specific aspects, including the utility of the global model for missing participants. Our experiments provide detailed insights into the effects of various influencing factors. In particular, we show that data skewness has a strong impact, often leading to overly optimistic model evaluations and, in some cases, even altering the effects of other influencing factors.

19.9DBMar 24
Spatial Analysis on Value-Based Quadtrees of Rasterized Vector Data

Diana Baumann, Nils Japke, Tim C. Rese et al.

Mobility data science offers insights into the complex interconnections of spatial data of moving objects and their surroundings, often based on a combination of vector and raster data. For example, mobility traces are usually in vector format, weather data are often in raster format. Yet, available spatial analysis tools for exploratory data science push data scientists towards one or the other, providing only limited support for the respective other. In this paper, we contribute to this problem space with a value-based quadtree index, which serves as a bridge builder to support joint spatial analysis on vector and raster data leveraging their unique autocorrelation property. We achieve a 90% reduction in median Point-in-Polygon query latency, while keeping the accuracy of query responses at equal level.

DCMar 6
Provuse: Platform-Side Function Fusion for Performance and Efficiency in FaaS Environments

Niklas Kowallik, Natalie Carl, Leon Pöllinger et al.

Function-as-a-Service (FaaS) platforms provide scalable and cost-efficient execution but suffer from increased latency and resource overheads in complex applications comprising multiple functions, particularly due to double billing when functions call each other. This paper presents Provuse, a transparent, platform-side optimization that automatically performs function fusion at runtime for independently deployed functions, thereby eliminating redundant function instances. This approach reduces both cost and latency without requiring users to change any code. Provusetargets provider-managed FaaS platforms that retain control over function entry points and deployment artifacts, enabling transparent, runtime execution consolidation without developer intervention. We provide two implementations for this approach using the tinyFaaS platform as well as Kubernetes, demonstrating compatibility with container orchestration frameworks. An evaluation shows consistent improvements, achieving an average end-to-end latency reduction of 26.33% and a mean RAM usage reduction of 53.57%. These results indicate that automatic function fusion is an effective platform-side strategy for reducing latency and RAM consumption in composed FaaS applications, highlighting the potential of transparent infrastructure-level optimizations in serverless systems.

DCJun 1, 2023
Predicting Temporal Aspects of Movement for Predictive Replication in Fog Environments

Emil Balitzki, Tobias Pfandzelter, David Bermbach

To fully exploit the benefits of the fog environment, efficient management of data locality is crucial. Blind or reactive data replication falls short in harnessing the potential of fog computing, necessitating more advanced techniques for predicting where and when clients will connect. While spatial prediction has received considerable attention, temporal prediction remains understudied. Our paper addresses this gap by examining the advantages of incorporating temporal prediction into existing spatial prediction models. We also provide a comprehensive analysis of spatio-temporal prediction models, such as Deep Neural Networks and Markov models, in the context of predictive replication. We propose a novel model using Holt-Winter's Exponential Smoothing for temporal prediction, leveraging sequential and periodical user movement patterns. In a fog network simulation with real user trajectories our model achieves a 15% reduction in excess data with a marginal 1% decrease in data availability.

79.3DCApr 29Code
FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

Minghe Wang, Trever Schirmer, Mohammadreza Malekabbasi et al.

Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input. However, deploying MoE models requires all experts to reside in memory, creating a gap between the resource used by activated experts and the provisioned resources. This underutilization is further pronounced in multi-tenant scenarios. In this paper, we propose FaaSMoE, a multi-tenant MoE serving architecture built on Function-as-a-Service (FaaS) platforms. FaaSMoE decouples the control and execution planes of MoE by deploying experts as stateless FaaS functions, enabling on-demand and scale-to-zero expert invocation across tenants. FaaSMoE further supports configurable expert granularity within functions, trading off per-expert elasticity for reduced invocation overhead. We implement a prototype with an open-source edge-oriented FaaS platform and evaluate it using Qwen1.5-moe-2.7B under multi-tenant workloads. Compared to a full-model baseline, FaaSMoE uses less than one third of the resources, demonstrating a practical and resource-efficient path towards scalable MoE serving in a multi-tenant environment.

45.5DCMay 18
Duet instrumentation: An Agentic Approach to Improving Sensitivity in Cloud Service Benchmarking

Sebastian Koch, Nils Japke, David Bermbach

Continuous cloud service performance benchmarking is essential for detecting performance bugs early before deploying them to production. However, detecting performance regressions using application benchmarks, which usually treat the system under test as a black box, is challenging due to variable I/O calls or changing performance characteristics of the underlying cloud infrastructure. Microbenchmarks are often more sensitive and accurate, but also more time-consuming to implement and run. Further, they do not capture the performance of the integrated system as a whole. A comprehensive performance assessment therefore typically requires a combination of both approaches. To address the shortcomings of application benchmarks, we propose duet instrumentation, a novel benchmarking paradigm enabled by recent advancements in large language model (LLM) code understanding. The idea is to analyze code changes between two consecutive application versions and measure performance differences directly at performance-relevant changes during a synchronized benchmark of both application versions, uncovering performance changes with higher sensitivity. We design a system that reliably automates the assessment and instrumentation of performance-relevant code changes between the two application versions. In experiments with a realistic testbed application offering configurable performance regressions, we find that our prototype achieves 58% precision, 93% recall, and 71% specificity (averaged across tasks) when comparing the generated instrumentation against the ideal instrumentation with a line-distance threshold of five. In the downstream application benchmark, we find that our prototype can detect performance regressions at up to 5x lower injected severity compared to a traditional duet application benchmark while preserving similar A/A latency distributions.

DCNov 27, 2025Code
DisCEdge: Distributed Context Management for Large Language Models at the Edge

Mohammadreza Malekabbasi, Minghe Wang, David Bermbach

Deploying Large Language Model (LLM) services at the edge benefits latency-sensitive and privacy-aware applications. However, the stateless nature of LLMs makes managing user context (e.g., sessions, preferences) across geo-distributed edge nodes challenging. Existing solutions, such as client-side context storage, often introduce network latency and bandwidth overhead, undermining the advantages of edge deployment. We propose DisCEdge, a distributed context management system that stores and replicates user context in tokenized form across edge nodes. By maintaining context as token sequences rather than raw text, our system avoids redundant computation and enables efficient data replication. We implement and evaluate an open-source prototype in a realistic edge environment with commodity hardware. We show DisCEdge improves median response times by up to 14.46% and lowers median inter-node synchronization overhead by up to 15% compared to a raw-text-based system. It also reduces client request sizes by a median of 90% compared to client-side context management, while guaranteeing data consistency.

16.9LGMay 8
FLAM: Evaluating Model Performance with Aggregatable Measures in Federated Learning

Fabian Stricker, Jose A. Peregrina, David Bermbach et al.

Performance evaluation is essential for assessing the quality of machine learning (ML) models and guiding deployment decisions. In federated learning (FL), assessing the performance is challenging because data are distributed across participants. Consequently, the coordinator must rely on locally computed evaluation metrics and aggregate them to assess the global model. A key challenge is that common aggregation strategies, such as weighted averaging based on the local samples per participant, do not always produce the same results as centralized evaluation. Existing definitions of performance evaluation are largely tailored to accuracy and do not generalize to other metrics, leading to inconsistencies between participant-based and centralized evaluation. However, such discrepancies are inconsistent with the FL objective and lead to a wrong calculation of the metric. To address this issue, we examine the underlying reasons for these discrepancies and propose FLAM, a performance evaluation method based on aggregatable measures that yields the same results as centralized evaluation without the need for a global test dataset.

DCNov 18, 2025
Analyzing the Impact of Participant Failures in Cross-Silo Federated Learning

Fabian Stricker, David Bermbach, Christian Zirpins

Federated learning (FL) is a new paradigm for training machine learning (ML) models without sharing data. While applying FL in cross-silo scenarios, where organizations collaborate, it is necessary that the FL system is reliable; however, participants can fail due to various reasons (e.g., communication issues or misconfigurations). In order to provide a reliable system, it is necessary to analyze the impact of participant failures. While this problem received attention in cross-device FL where mobile devices with limited resources participate, there is comparatively little research in cross-silo FL. Therefore, we conduct an extensive study for analyzing the impact of participant failures on the model quality in the context of inter-organizational cross-silo FL with few participants. In our study, we focus on analyzing generally influential factors such as the impact of the timing and the data as well as the impact on the evaluation, which is important for deciding, if the model should be deployed. We show that under high skews the evaluation is optimistic and hides the real impact. Furthermore, we demonstrate that the timing impacts the quality of the trained model. Our results offer insights for researchers and software architects aiming to build robust FL systems.

CYJun 15, 2020
SimRa: Using Crowdsourcing to Identify Near Miss Hotspots in Bicycle Traffic

Ahmet-Serdar Karakaya, Jonathan Hasenburg, David Bermbach

An increased modal share of bicycle traffic is a key mechanism to reduce emissions and solve traffic-related problems. However, a lack of (perceived) safety keeps people from using their bikes more frequently. To improve safety in bicycle traffic, city planners need an overview of accidents, near miss incidents, and bike routes. Such information, however, is currently not available. In this paper, we describe SimRa, a platform for collecting data on bicycle routes and near miss incidents using smartphone-based crowdsourcing. We also describe how we identify dangerous near miss hotspots based on the collected data and propose a scoring model.

SEMar 18, 2019
Benchmarking Web API Quality -- Revisited

David Bermbach, Erik Wittern

Modern applications increasingly interact with web APIs -- reusable components, deployed and operated outside the application, and accessed over the network. Their existence, arguably, spurs application innovations, making it easy to integrate data or functionalities. While previous work has analyzed the ecosystem of web APIs and their design, little is known about web API quality at runtime. This gap is critical, as qualities including availability, latency, or provider security preferences can severely impact applications and user experience. In this paper, we revisit a 3-month, geo-distributed benchmark of popular web APIs, originally performed in 2015. We repeat this benchmark in 2018 and compare results from these two benchmarks regarding availability and latency. We furthermore introduce new results from assessing provider security preferences, collected both in 2015 and 2018, and results from our attempts to reach out to API providers with the results from our 2015 experiments. Our extensive experiments show that web API qualities vary 1.) based on the geo-distribution of clients, 2.) during our individual experiments, and 3.) between the two experiments. Our findings provide evidence to foster the discussion around web API quality, and can act as a basis for the creation of tools and approaches to mitigate quality issues.