FRAPpuccino: Fault-detection through Runtime Analysis of Provenance
This addresses fault detection for PaaS users running applications on large clusters, representing an incremental improvement in anomaly detection methods.
The paper tackles the problem of detecting faults in Platform as a Service (PaaS) applications by modeling legitimate behavior using provenance graphs and comparing new instances with a dynamic sliding window algorithm, resulting in accurate anomaly detection as shown in experimental results.
We present FRAPpuccino (or FRAP), a provenance-based fault detection mechanism for Platform as a Service (PaaS) users, who run many instances of an application on a large cluster of machines. FRAP models, records, and analyzes the behavior of an application and its impact on the system as a directed acyclic provenance graph. It assumes that most instances behave normally and uses their behavior to construct a model of legitimate behavior. Given a model of legitimate behavior, FRAP uses a dynamic sliding window algorithm to compare a new instance's execution to that of the model. Any instance that does not conform to the model is identified as an anomaly. We present the FRAP prototype and experimental results showing that it can accurately detect application anomalies.