Daniel W. Barowy

PLAug 27, 2021

LaForge: Always-Correct and Fast Incremental Builds from Simple Specifications

Charlie Curtsinger, Daniel W. Barowy

Developers rely on build systems to generate software from code. At a minimum, a build system should produce build targets from a clean copy of the code. However, developers rarely work from clean checkouts. Instead, they rebuild software repeatedly, sometimes hundreds of times a day. To keep rebuilds fast, build systems run incrementally, executing commands only when built state cannot be reused. Existing tools like make present users with a tradeoff. Simple build specifications are easy to write, but limit incremental work. More complex build specifications produce faster incremental builds, but writing them is labor-intensive and error-prone. This work shows that no such tradeoff is necessary; build specifications can be both simple and fast. We introduce LaForge, a novel build tool that eliminates the need to specify dependencies or incremental build steps. LaForge builds are easy to specify; developers write a simple script that runs a full build. Even a single command like gcc src/*.c will suffice. LaForge traces the execution of the build and generates a transcript in the TraceIR language. On later builds, LaForge evaluates the TraceIR transcript to detect changes and perform an efficient incremental rebuild that automatically captures all build dependencies. We evaluate LaForge by building 14 software packages, including LLVM and memcached. Our results show that LaForge automatically generates efficient builds from simple build specifications. Full builds with LaForge have a median overhead of 16.1% compared to a project's default full build. LaForge's incremental builds consistently run fewer commands, and most take less than 3.08s longer than manually-specified incremental builds. Finally, LaForge is always correct.

PLJan 30, 2019

ExceLint: Automatically Finding Spreadsheet Formula Errors

Daniel W. Barowy, Emery D. Berger, Benjamin Zorn

Spreadsheets are one of the most widely used programming environments, and are widely deployed in domains like finance where errors can have catastrophic consequences. We present a static analysis specifically designed to find spreadsheet formula errors. Our analysis directly leverages the rectangular character of spreadsheets. It uses an information-theoretic approach to identify formulas that are especially surprising disruptions to nearby rectangular regions. We present ExceLint, an implementation of our static analysis for Microsoft Excel. We demonstrate that ExceLint is fast and effective: across a corpus of 70 spreadsheets, ExceLint takes a median of 5 seconds per spreadsheet, and it significantly outperforms the state of the art analysis.

Daniel W. Barowy

2 Papers