SEDCMar 30

Wherefore Art Thou? Provenance-Guided Automatic Online Debugging with Lumos

arXiv:2603.2901360.8h-index: 3
AI Analysis

This addresses the challenge for developers of quickly identifying root causes of non-deterministic bugs in complex distributed systems, representing a novel method rather than an incremental improvement.

The paper tackles the problem of debugging distributed systems in production by introducing Lumos, a framework that automatically records application-level bug provenance to link symptoms to root causes, achieving low runtime overhead and requiring only a few bug occurrences.

Debugging distributed systems in-production is inevitable and hard. Myriad interactions between concurrent components in modern, complex and large-scale systems cause non-deterministic bugs that offline testing and verification fail to capture. When bugs surface at runtime, their root causes may be far removed from their symptoms. To identify a root cause, developers often need evidence scattered across multiple components and traces. Unfortunately, existing tools fail to quickly and automatically record useful provenance information at low overheads, leaving developers to manually perform the onerous evidence collection task. Lumos is an online debugging framework that exposes application-level bug provenances--the computational history linking symptoms of an incident to their root causes. Lumos leverages dependency-guided instrumentation powered by static analysis to identify program state related to a bug's provenance, and exposes them via lightweight on-demand recording. Lumos provides developers with enough evidence to identify a bug's root cause, while incurring low runtime overhead, and given only a few occurrences of a bug.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes