SEAIFeb 17, 2021

FIXME: Enhance Software Reliability with Hybrid Approaches in Cloud

arXiv:2102.09336v113 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of reducing mean time to recover (MTTR) for site reliability engineers and software engineers in cloud-based enterprises, representing an incremental improvement over existing approaches.

The paper tackles the challenge of identifying root causes of reliability issues in complex cloud environments by introducing FIXME, a hybrid diagnosis approach that combines multiple methods to correlate information from different data sources, achieving about 17% better precision in evaluations.

With the promise of reliability in cloud, more enterprises are migrating to cloud. The process of continuous integration/deployment (CICD) in cloud connects developers who need to deliver value faster and more transparently with site reliability engineers (SREs) who need to manage applications reliably. SREs feed back development issues to developers, and developers commit fixes and trigger CICD to redeploy. The release cycle is more continuous than ever, thus the code to production is faster and more automated. To provide this higher level agility, the cloud platforms become more complex in the face of flexibility with deeper layers of virtualization. However, reliability does not come for free with all these complexities. Software engineers and SREs need to deal with wider information spectrum from virtualized layers. Therefore, providing correlated information with true positive evidences is critical to identify the root cause of issues quickly in order to reduce mean time to recover (MTTR), performance metrics for SREs. Similarity, knowledge, or statistics driven approaches have been effective, but with increasing data volume and types, an individual approach is limited to correlate semantic relations of different data sources. In this paper, we introduce FIXME to enhance software reliability with hybrid diagnosis approaches for enterprises. Our evaluation results show using hybrid diagnosis approach is about 17% better in precision. The results are helpful for both practitioners and researchers to develop hybrid diagnosis in the highly dynamic cloud environment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes