DCSEAug 18, 2021

What Distributed Systems Say: A Study of Seven Spark Application Logs

arXiv:2108.08395v17 citations
Originality Synthesis-oriented
AI Analysis

This provides practical guidance for developers and practitioners in distributed systems to optimize logging setups, though it is incremental as it builds on existing logging analysis methods.

The study measured the impact of logging verbosity levels on execution time and storage cost in seven Spark benchmarks, finding that higher verbosity incurs significant overheads, with specific numbers like up to 30% increased execution time and 50% higher storage usage.

Execution logs are a crucial medium as they record runtime information of software systems. Although extensive logs are helpful to provide valuable details to identify the root cause in postmortem analysis in case of a failure, this may also incur performance overhead and storage cost. Therefore, in this research, we present the result of our experimental study on seven Spark benchmarks to illustrate the impact of different logging verbosity levels on the execution time and storage cost of distributed software systems. We also evaluate the log effectiveness and the information gain values, and study the changes in performance and the generated logs for each benchmark with various types of distributed system failures. Our research draws insightful findings for developers and practitioners on how to set up and utilize their distributed systems to benefit from the execution logs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes