Gongqi Huang

60.8SEMar 30

Wherefore Art Thou? Provenance-Guided Automatic Online Debugging with Lumos

Jingyuan Chen, Lei Zhang, Leon Schuermann et al.

Debugging distributed systems in-production is inevitable and hard. Myriad interactions between concurrent components in modern, complex and large-scale systems cause non-deterministic bugs that offline testing and verification fail to capture. When bugs surface at runtime, their root causes may be far removed from their symptoms. To identify a root cause, developers often need evidence scattered across multiple components and traces. Unfortunately, existing tools fail to quickly and automatically record useful provenance information at low overheads, leaving developers to manually perform the onerous evidence collection task. Lumos is an online debugging framework that exposes application-level bug provenances--the computational history linking symptoms of an incident to their root causes. Lumos leverages dependency-guided instrumentation powered by static analysis to identify program state related to a bug's provenance, and exposes them via lightweight on-demand recording. Lumos provides developers with enough evidence to identify a bug's root cause, while incurring low runtime overhead, and given only a few occurrences of a bug.

SEOct 5, 2020

Automated Reasoning and Detection of Specious Configuration in Large Systems with Symbolic Execution

Yigong Hu, Gongqi Huang, Peng Huang

Misconfiguration is a major cause of system failures. Prior solutions focus on detecting invalid settings that are introduced by user mistakes. But another type of misconfiguration that continues to haunt production services is specious configuration--settings that are valid but lead to unexpectedly poor performance in production. Such misconfigurations are subtle, so even careful administrators may fail to foresee them. We propose a tool called Violet to detect such misconfiguration. We realize the crux of specious configuration is that it causes some slow code path to be executed, but the bad performance effect cannot always be triggered. Violet thus takes a novel approach that uses selective symbolic execution to systematically reason about the performance effect of configuration parameters, their combination effect, and the relationship with input. Violet outputs a performance impact model for the automatic detection of poor configuration settings. We applied Violet on four large systems. To evaluate the effectiveness of Violet, we collect 17 real-world specious configuration cases. Violet detects 15 of them. Violet also identifies 9 unknown specious configurations.

Gongqi Huang

2 Papers