SEDCPFSep 1, 2020

Theodolite: Scalability Benchmarking of Distributed Stream Processing Engines in Microservice Architectures

arXiv:2009.00304v355 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the need for standardized scalability benchmarking in microservice architectures, particularly for Industrial IoT data processing, but it is incremental as it builds on existing benchmarking concepts with a new framework.

The authors tackled the problem of benchmarking scalability in distributed stream processing engines by introducing Theodolite, a method and framework that defines use cases and workload dimensions to measure how resource demands evolve with increasing workloads, resulting in the evaluation of Kafka Streams and Apache Flink across 4 use cases and 7 dimensions.

Distributed stream processing engines are designed with a focus on scalability to process big data volumes in a continuous manner. We present the Theodolite method for benchmarking the scalability of distributed stream processing engines. Core of this method is the definition of use cases that microservices implementing stream processing have to fulfill. For each use case, our method identifies relevant workload dimensions that might affect the scalability of a use case. We propose to design one benchmark per use case and relevant workload dimension. We present a general benchmarking framework, which can be applied to execute the individual benchmarks for a given use case and workload dimension. Our framework executes an implementation of the use case's dataflow architecture for different workloads of the given dimension and various numbers of processing instances. This way, it identifies how resources demand evolves with increasing workloads. Within the scope of this paper, we present 4 identified use cases, derived from processing Industrial Internet of Things data, and 7 corresponding workload dimensions. We provide implementations of 4 benchmarks with Kafka Streams and Apache Flink as well as an implementation of our benchmarking framework to execute scalability benchmarks in cloud environments. We use both for evaluating the Theodolite method and for benchmarking Kafka Streams' and Flink's scalability for different deployment options.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes