SE DCMar 1, 2018

Localizing Faults in Cloud Systems

Leonardo Mariani, Cristina Monni, Mauro Pezzé, Oliviero Riganelli, Rui Xin

arXiv:1803.00356v119.392 citations

Originality Incremental advance

AI Analysis

This addresses reliability issues for software applications in cloud systems, offering a more practical solution than existing methods, though it appears incremental in its combination of machine learning and graph theory.

The paper tackles the problem of fault localization in cloud systems, which is challenging due to the lack of control over execution environments and the limitations of existing heavyweight methods. It proposes LOUD, a lightweight approach using positive training only, achieving high precision in fault localization.

By leveraging large clusters of commodity hardware, the Cloud offers great opportunities to optimize the operative costs of software systems, but impacts significantly on the reliability of software applications. The lack of control of applications over Cloud execution environments largely limits the applicability of state-of-the-art approaches that address reliability issues by relying on heavyweight training with injected faults. In this paper, we propose \emph(LOUD}, a lightweight fault localization approach that relies on positive training only, and can thus operate within the constraints of Cloud systems. \emph{LOUD} relies on machine learning and graph theory. It trains machine learning models with correct executions only, and compensates the inaccuracy that derives from training with positive samples, by elaborating the outcome of machine learning techniques with graph theory algorithms. The experimental results reported in this paper confirm that \emph{LOUD} can localize faults with high precision, by relying only on a lightweight positive training.

View on arXiv PDF

Similar