Lightweight Record-and-Replay for Intermittent Tests Failures
This addresses intermittent test failures for developers of concurrent, message-passing programs, but it is incremental as it builds on existing record-and-replay solutions.
The paper tackles the problem of intermittent test failures in concurrent programs by introducing lightweight record-and-replay to handle nondeterminism from thread communication, showing it greatly reduces failures for some tests in the Servo web browser but not others.
In this paper we present lightweight record-and-replay (RR). In contrast to traditional "fully deterministic" RR solutions, lightweight RR focuses on handling nondeterminism arising from thread communication for programs with concurrent, message-passing architectures. By decreasing nondeterminism in programs, lightweight RR decreases the number of intermittent failures in program's test suites. We evaluated the effectiveness of lightweight RR on Servo, a highly concurrent web browser. Our evaluation shows lightweight RR is effective at greatly reducing intermittent failures for some tests, but not others. Lightweight RR performance overhead remains a work in progress, but log sizes are quite small. We believe with further work lightweight RR could prove useful for lowering nondeterminism in programs at a negligible performance overhead.