Autonomous Fault Detection in Self-Healing Systems using Restricted Boltzmann Machines
This work addresses the issue of reducing operational complexity and costs in managing computing environments for system administrators, but it appears incremental as it extends previous work.
The paper tackles the problem of autonomously detecting and recovering from faults in computing systems by using Restricted Boltzmann Machines and contrastive divergence learning to analyze historical feature data, resulting in an improvement to the state of the art by enabling heuristic prediction of feature data across entire sequences.
Autonomously detecting and recovering from faults is one approach for reducing the operational complexity and costs associated with managing computing environments. We present a novel methodology for autonomously generating investigation leads that help identify systems faults, and extends our previous work in this area by leveraging Restricted Boltzmann Machines (RBMs) and contrastive divergence learning to analyse changes in historical feature data. This allows us to heuristically identify the root cause of a fault, and demonstrate an improvement to the state of the art by showing feature data can be predicted heuristically beyond a single instance to include entire sequences of information.