25.7SEMar 27
Etna: An Evaluation Platform for Property-Based TestingAlperen Keles, Jessica Shi, Nikhil Kamath et al.
Property-based testing is a mainstay of functional programming, boasting a rich literature, an enthusiastic user community, and an abundance of tools~ -- so many, indeed, that new users may have difficulty choosing. Moreover, any given framework may support a variety of strategies for generating test inputs; even experienced users may wonder which are better in any given situation. Sadly, the PBT literature, though long on creativity, is short on rigorous comparisons to help answer such questions. We present ETNA, a platform for empirical evaluation and comparison of PBT techniques. ETNA incorporates a number of popular PBT frameworks and testing workloads from the literature, and its extensible architecture makes adding new ones easy, while handling the technical drudgery of performance measurement. To illustrate its benefits, we use ETNA to carry out several experiments with popular PBT approaches in Rocq, Haskell, OCaml, Racket, and Rust, allowing users to more clearly understand best practices and tradeoffs.
67.8PLApr 14
The Search for Constrained Random GeneratorsHarrison Goldstein, Hila Peleg, Cassia Torczon et al.
Among the biggest challenges in property-based testing (PBT) is the constrained random generation problem: given a predicate on program values, randomly sample from the set of all values satisfying that predicate, and only those values. Efficient solutions to this problem are critical, since the executable specifications used by PBT often have preconditions that input values must satisfy in order to be valid test cases, and satisfying values are often sparsely distributed. We propose a novel approach to this problem using ideas from deductive program synthesis. We present a set of synthesis rules, based on a denotational semantics of generators, that give rise to an automatic procedure for synthesizing correct generators. Our system handles recursive predicates by rewriting them as catamorphisms and then matching with appropriate anamorphisms; this is theoretically simpler than other approaches to synthesis for recursive functions, yet still extremely expressive. Our implementation, Palamedes, is an extensible library for the Lean theorem prover. The synthesis algorithm itself is built on standard proof-search tactics, reducing implementation burden and allowing the algorithm to benefit from further advances in Lean proof automation.
20.7DBMar 23
DIRT: Database-Integrated Random TestingAlperen Keles, Ethan Chou, Harrison Goldstein et al.
Database management systems (DBMSs) are notoriously complex, making them difficult to test effectively, especially during early development when many features are incomplete. Traditional testing tools like SQLancer and SQLSmith are highly effective for mature databases, but they struggle with high false positive rates and low actionability when applied to evolving systems. We present DIRT, a paradigm designed specifically for testing databases during development, which integrates a testing framework directly into the DBMS, enabling the random testing process to evolve in tandem with the system and reducing false positives by construction. We introduce generation actions, an abstraction for allowing database developers rather than testing experts to specify correctness properties. We evaluate DIRT on Turso, an actively developed SQLite-compatible OLTP engine, and show that it finds 23 unique, confirmed bugs--significantly outperforming off-the-shelf SQLancer variants in terms of true positive rate and usefulness of bug reports. Our results demonstrate that embedding testing infrastructure within the DBMS can dramatically improve its effectiveness and usability during development.
PLJan 31, 2022
A Formal Model of Checked CLiyi Li, Yiyun Liu, Deena L. Postol et al.
We present a formal model of Checked C, a dialect of C that aims to enforce spatial memory safety. Our model pays particular attention to the semantics of dynamically sized, potentially null-terminated arrays. We formalize this model in Coq, and prove that any spatial memory safety errors can be blamed on portions of the program labeled unchecked; this is a Checked C feature that supports incremental porting and backward compatibility. While our model's operational semantics uses annotated ("fat") pointers to enforce spatial safety, we show that such annotations can be safely erased: Using PLT Redex we formalize an executable version of our model and a compilation procedure from it to an untyped C-like language, and use randomized testing to validate that generated code faithfully simulates the original. Finally, we develop a custom random generator for well-typed and almost-well-typed terms in our Redex model, and use it to search for inconsistencies between our model and the Clang Checked C implementation. We find these steps to be a useful way to co-develop a language (Checked C is still in development) and a core model of it.
PLMay 2, 2021
Formalizing Stack Safety as a Security PropertySean Noble Anderson, Roberto Blanco, Leonidas Lampropoulos et al.
The term stack safety is used to describe a variety of compiler, run-time, and hardware mechanisms for protecting stack memory. Unlike "the heap," the ISA-level stack does not correspond to a single high-level language concept: different compilers use it in different ways to support procedural and functional abstraction mechanisms from a wide range of languages. This protean nature makes it difficult to nail down what it means to correctly enforce stack safety. We propose a new formal characterization of stack safety using concepts from language-based security. Rather than treating stack safety as a monolithic property, we decompose it into an integrity property and a confidentiality property for each of the caller and the callee, plus a control-flow property: five properties in all. This formulation is motivated by a particular class of enforcement mechanisms, the "lazy" stack safety micro-policies studied by Roessler and DeHon, which permit functions to write into one another's frames but taint the changed locations so that the frame's owner cannot access them. No existing characterization of stack safety captures this style of safety; we capture it here by stating our properties in terms of the observable behavior of the system. Our properties go further than previous formal definitions of stack safety, supporting caller- and callee-saved registers, arguments passed on the stack, and tail-call elimination. We validate the properties by using them to distinguish between correct and incorrect implementations of Roessler and DeHon's micro-policies using property-based random testing. Our test harness successfully identifies several broken variants, including Roessler and DeHon's lazy policy; a repaired version of their policy passes our tests.
LGMay 24, 2016
Measuring Neural Net Robustness with ConstraintsOsbert Bastani, Yani Ioannou, Leonidas Lampropoulos et al.
Despite having high accuracy, neural nets have been shown to be susceptible to adversarial examples, where a small perturbation to an input can cause it to become mislabeled. We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. Our algorithm generates more informative estimates of robustness metrics compared to estimates based on existing algorithms. Furthermore, we show how existing approaches to improving robustness "overfit" to adversarial examples generated using a specific algorithm. Finally, we show that our techniques can be used to additionally improve neural net robustness both according to the metrics that we propose, but also according to previously proposed metrics.
SEOct 23, 2012
Automatic WSDL-guided Test Case Generation for PropEr Testing of Web ServicesLeonidas Lampropoulos, Konstantinos Sagonas
With web services already being key ingredients of modern web systems, automatic and easy-to-use but at the same time powerful and expressive testing frameworks for web services are increasingly important. Our work aims at fully automatic testing of web services: ideally the user only specifies properties that the web service is expected to satisfy, in the form of input-output relations, and the system handles all the rest. In this paper we present in detail the component which lies at the heart of this system: how the WSDL specification of a web service is used to automatically create test case generators that can be fed to PropEr, a property-based testing tool, to create structurally valid random test cases for its operations and check its responses. Although the process is fully automatic, our tool optionally allows the user to easily modify its output to either add semantic information to the generators or write properties that test for more involved functionality of the web services.