Charlie Curtsinger

PLAug 27, 2021

LaForge: Always-Correct and Fast Incremental Builds from Simple Specifications

Charlie Curtsinger, Daniel W. Barowy

Developers rely on build systems to generate software from code. At a minimum, a build system should produce build targets from a clean copy of the code. However, developers rarely work from clean checkouts. Instead, they rebuild software repeatedly, sometimes hundreds of times a day. To keep rebuilds fast, build systems run incrementally, executing commands only when built state cannot be reused. Existing tools like make present users with a tradeoff. Simple build specifications are easy to write, but limit incremental work. More complex build specifications produce faster incremental builds, but writing them is labor-intensive and error-prone. This work shows that no such tradeoff is necessary; build specifications can be both simple and fast. We introduce LaForge, a novel build tool that eliminates the need to specify dependencies or incremental build steps. LaForge builds are easy to specify; developers write a simple script that runs a full build. Even a single command like gcc src/*.c will suffice. LaForge traces the execution of the build and generates a transcript in the TraceIR language. On later builds, LaForge evaluates the TraceIR transcript to detect changes and perform an efficient incremental rebuild that automatically captures all build dependencies. We evaluate LaForge by building 14 software packages, including LLVM and memcached. Our results show that LaForge automatically generates efficient builds from simple build specifications. Full builds with LaForge have a median overhead of 16.1% compared to a project's default full build. LaForge's incremental builds consistently run fewer commands, and most take less than 3.08s longer than manually-specified incremental builds. Finally, LaForge is always correct.

SEJan 29, 2016

DoubleTake: Fast and Precise Error Detection via Evidence-Based Dynamic Analysis

Tongping Liu, Charlie Curtsinger, Emery D. Berger

This paper presents evidence-based dynamic analysis, an approach that enables lightweight analyses--under 5% overhead for these bugs--making it practical for the first time to perform these analyses in deployed settings. The key insight of evidence-based dynamic analysis is that for a class of errors, it is possible to ensure that evidence that they happened at some point in the past remains for later detection. Evidence-based dynamic analysis allows execution to proceed at nearly full speed until the end of an epoch (e.g., a heavyweight system call). It then examines program state to check for evidence that an error occurred at some time during that epoch. If so, it rolls back execution and re-executes the code with instrumentation activated to pinpoint the error. We present DoubleTake, a prototype evidence-based dynamic analysis framework. DoubleTake is practical and easy to deploy, requiring neither custom hardware, compiler, nor operating system support. We demonstrate DoubleTake's generality and efficiency by building dynamic analyses that find buffer overflows, memory use-after-free errors, and memory leaks. Our evaluation shows that DoubleTake is efficient, imposing just 4% overhead on average, making it the fastest such system to date. It is also precise: DoubleTake pinpoints the location of these errors to the exact line and memory addresses where they occur, providing valuable debugging information to programmers.

Charlie Curtsinger

2 Papers