Martin Rinard

LG
h-index3
28papers
578citations
Novelty55%
AI Score54

28 Papers

LGApr 21, 2023
Effective Neural Network $L_0$ Regularization With BinMask

Kai Jia, Martin Rinard

$L_0$ regularization of neural networks is a fundamental problem. In addition to regularizing models for better generalizability, $L_0$ regularization also applies to selecting input features and training sparse neural networks. There is a large body of research on related topics, some with quite complicated methods. In this paper, we show that a straightforward formulation, BinMask, which multiplies weights with deterministic binary masks and uses the identity straight-through estimator for backpropagation, is an effective $L_0$ regularizer. We evaluate BinMask on three tasks: feature selection, network sparsification, and model regularization. Despite its simplicity, BinMask achieves competitive performance on all the benchmarks without task-specific tuning compared to methods designed for each task. Our results suggest that decoupling weights from mask optimization, which has been widely adopted by previous work, is a key component for effective $L_0$ regularization.

AIJul 11, 2023
Depth-bounded Epistemic Logic

Farid Arthaud, Martin Rinard

Epistemic logics model how agents reason about their beliefs and the beliefs of other agents. Existing logics typically assume the ability of agents to reason perfectly about propositions of unbounded modal depth. We present DBEL, an extension of S5 that models agents that can reason about epistemic formulas only up to a specific modal depth. To support explicit reasoning about agent depths, DBEL includes depth atoms Ead (agent a has depth exactly d) and Pad (agent a has depth at least d). We provide a sound and complete axiomatization of DBEL. We extend DBEL to support public announcements for bounded depth agents and show how the resulting DPAL logic generalizes standard axioms from public announcement logic. We present two alternate extensions and identify two undesirable properties, amnesia and knowledge leakage, that these extensions have but DPAL does not. We provide axiomatizations of these logics as well as complexity results for satisfiability and model checking. Finally, we use these logics to illustrate how agents with bounded modal depth reason in the classical muddy children problem, including upper and lower bounds on the depth knowledge necessary for agents to successfully solve the problem.

GTMay 26
Nash Equilibria with Derangement Degree Probabilities

Edan Orzech, Martin Rinard

We prove for every $n\ge4$ the existence of an $n$-player game in normal form with integer payoffs that has a unique Nash equilibrium, which is fully mixed. In the equilibrium, each probability weight is an algebraic number of degree $\mathbin{!n}$ (the derangement number), and its minimal polynomial has Galois group $S_{\mathbin{!n}}$ and $\mathbin{!n}+1$ nonzero coefficients.

SEMar 23
Lemma Discovery in Agentic Program Verification

Huan Zhao, Haoxin Tu, Zhengyao Liu et al.

Deductive verification provides strong correctness guarantees for code by extracting verification conditions (VCs) and writing formal proofs for them. The expertise-intensive task of VC proving is the main bottleneck in this process, and has been partly automated owing to recent advances in Large Language Model (LLM) agents. However, existing proof agents are not able to discover helper lemmas - auxiliary lemmas that aid in proving - and thus fall short as programs grow in size and complexity. In this paper, we argue that VC proving for program verification is more than a purely mathematical task, and benefits considerably from program comprehension. Our key insight is that human-proof engineers often discover and apply helper lemmas based on their understanding of the program semantics, which are not directly reflected in the VCs produced by VC generators. Inspired by this insight, we propose an LLM agent, LemmaNet, that discovers helper lemmas in two ways. Specifically, the agent first synthesizes lemmas offline by directly analyzing the source code and specifications, and then relating this semantic understanding to the mechanical, verbose encoding produced by VC generators. As the proof unfolds, LemmaNet then adapts existing helper lemmas online to accommodate evolving proof states, enabling the agent to effectively discharge complex VCs on-the-fly. We evaluate LemmaNet on SV-COMP and established real-world subjects, including modules of the Linux kernel, Contiki OS, standard C++ library, and X.509 parser. Our experimental results demonstrate that LemmaNet significantly outperforms state-of-the-art approaches, highlighting the importance of program comprehension-aided lemma discovery in agentic program verification.

LGJun 8, 2023
Sound Explanation for Trustworthy Machine Learning

Kai Jia, Pasapol Saowakon, Limor Appelbaum et al.

We take a formal approach to the explainability problem of machine learning systems. We argue against the practice of interpreting black-box models via attributing scores to input components due to inherently conflicting goals of attribution-based interpretation. We prove that no attribution algorithm satisfies specificity, additivity, completeness, and baseline invariance. We then formalize the concept, sound explanation, that has been informally adopted in prior work. A sound explanation entails providing sufficient information to causally explain the predictions made by a system. Finally, we present the application of feature selection as a sound explanation for cancer prediction models to cultivate trust among clinicians.

CLJul 18, 2024
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data

Charles Jin, Martin Rinard

As language models (LMs) deliver increasing performance on a range of NLP tasks, probing classifiers have become an indispensable technique in the effort to better understand their inner workings. A typical setup involves (1) defining an auxiliary task consisting of a dataset of text annotated with labels, then (2) supervising small classifiers to predict the labels from the representations of a pretrained LM as it processed the dataset. A high probing accuracy is interpreted as evidence that the LM has learned to perform the auxiliary task as an unsupervised byproduct of its original pretraining objective. Despite the widespread usage of probes, however, the robust design and analysis of probing experiments remains a challenge. We develop a formal perspective on probing using structural causal models (SCM). Specifically, given an SCM which explains the distribution of tokens observed during training, we frame the central hypothesis as whether the LM has learned to represent the latent variables of the SCM. Empirically, we extend a recent study of LMs in the context of a synthetic grid-world navigation task, where having an exact model of the underlying causal structure allows us to draw strong inferences from the result of probing experiments. Our techniques provide robust empirical evidence for the ability of LMs to induce the latent concepts underlying text.

LGMay 18, 2023Code
Emergent Representations of Program Semantics in Language Models Trained on Programs

Charles Jin, Martin Rinard

We present evidence that language models (LMs) of code can learn to represent the formal semantics of programs, despite being trained only to perform next-token prediction. Specifically, we train a Transformer model on a synthetic corpus of programs written in a domain-specific language for navigating 2D grid world environments. Each program in the corpus is preceded by a (partial) specification in the form of several input-output grid world states. Despite providing no further inductive biases, we find that a probing classifier is able to extract increasingly accurate representations of the unobserved, intermediate grid world states from the LM hidden states over the course of training, suggesting the LM acquires an emergent ability to interpret programs in the formal sense. We also develop a novel interventional baseline that enables us to disambiguate what is represented by the LM as opposed to learned by the probe. We anticipate that this technique may be generally applicable to a broad range of semantic probing experiments. In summary, this paper does not propose any new techniques for training LMs of code, but develops an experimental framework for and provides insights into the acquisition and representation of formal semantics in statistical models of code. Our code is available at https://github.com/charlesjin/emergent-semantics.

PLMay 9
Quantitative Comparison of Credible Compilation and Verification In Coding Agent Compiler Development

Martin Rinard

Formal program verification is a longstanding goal in the field. We present the first quantitative comparison of the two primary compiler verification approaches, credible compilation/translation validation and full verification. Working with the first verified compiler developed by a coding agent (operating under human supervision), we present quantitative results from a coding agent implementing several optimizations using these two approaches. The results indicate that 1) verification requires roughly an order of magnitude more development effort than credible compilation, 2) to enhance provability, the coding agent chooses less efficient algorithms and data structures for verified optimizations, and 3) in an attempt to minimize proof effort the coding agent repeatedly implemented optimization scope reductions for verified optimizations, and 4) certificate checking time dominates optimization and certificate generation time for the considered optimizations. Because of the increased proof overhead, verified optimizations required substantially more supervision and coding sessions than credible compilation optimizations. Given the capabilities of a modern coding agent working in this context, implementation efforts for both credible compilation and verified versions remained feasible for the considered optimizations (unreachable code elimination, dead assignment elimination, and constant propagation/folding).

PLMay 3
Testing, Credible Compilation, and Verification in the Axon Verified Compiler in Lean and Claude Code

Martin Rinard

This paper presents the use of testing, credible compilation/translation validation, verification, and audits in the Axon compiler. Axon comes with fully machine checked proofs that guarantee the correctness of the generated code. All code and proofs were written in Lean by Claude Code, with the correctness proofs eliminating any need to audit or examine any verified code. I present a development process for using these validation techniques, evaluate the use of this process during the development of the compiler, and discuss implications for other development efforts.

SEApr 7, 2025
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning

Rem Yang, Julian Dai, Nikos Vasilakis et al.

We assess how the code reasoning abilities of large language models (LLMs) generalize to different kinds of programs. We present techniques for obtaining in- and out-of-distribution programs with different characteristics: code sampled from a domain-specific language, code automatically generated by an LLM, code collected from competitive programming contests, and mutated versions of these programs. We also present an experimental methodology for evaluating LLM generalization by comparing their performance on these programs. We perform an extensive evaluation across 10 state-of-the-art models from the past year, obtaining insights into their generalization capabilities over time and across different classes of programs. Our results highlight that while earlier models exhibit behavior consistent with pattern matching, the latest models exhibit strong generalization abilities on code reasoning.

LGAug 18, 2021
Verifying Low-dimensional Input Neural Networks via Input Quantization

Kai Jia, Martin Rinard

Deep neural networks are an attractive tool for compressing the control policy lookup tables in systems such as the Airborne Collision Avoidance System (ACAS). It is vital to ensure the safety of such neural controllers via verification techniques. The problem of analyzing ACAS Xu networks has motivated many successful neural network verifiers. These verifiers typically analyze the internal computation of neural networks to decide whether a property regarding the input/output holds. The intrinsic complexity of neural network computation renders such verifiers slow to run and vulnerable to floating-point error. This paper revisits the original problem of verifying ACAS Xu networks. The networks take low-dimensional sensory inputs with training data provided by a precomputed lookup table. We propose to prepend an input quantization layer to the network. Quantization allows efficient verification via input state enumeration, whose complexity is bounded by the size of the quantization space. Quantization is equivalent to nearest-neighbor interpolation at run time, which has been shown to provide acceptable accuracy for ACAS in simulation. Moreover, our technique can deliver exact verification results immune to floating-point error if we directly enumerate the network outputs on the target inference implementation or on an accurate simulation of the target implementation.

LGMay 8, 2021
Incompatibility Clustering as a Defense Against Backdoor Poisoning Attacks

Charles Jin, Melinda Sun, Martin Rinard

We propose a novel clustering mechanism based on an incompatibility property between subsets of data that emerges during model training. This mechanism partitions the dataset into subsets that generalize only to themselves, i.e., training on one subset does not improve performance on the other subsets. Leveraging the interaction between the dataset and the training process, our clustering mechanism partitions datasets into clusters that are defined by--and therefore meaningful to--the objective of the training process. We apply our clustering mechanism to defend against data poisoning attacks, in which the attacker injects malicious poisoned data into the training dataset to affect the trained model's output. Our evaluation focuses on backdoor attacks against deep neural networks trained to perform image classification using the GTSRB and CIFAR-10 datasets. Our results show that (1) these attacks produce poisoned datasets in which the poisoned and clean data are incompatible and (2) our technique successfully identifies (and removes) the poisoned data. In an end-to-end evaluation, our defense reduces the attack success rate to below 1% on 134 out of 165 scenarios, with only a 2% drop in clean accuracy on CIFAR-10 and a negligible drop in clean accuracy on GTSRB.

PLApr 27, 2021
Inductive Program Synthesis over Noisy Datasets using Abstraction Refinement Based Optimization

Shivam Handa, Martin Rinard

We present a new synthesis algorithm to solve program synthesis over noisy datasets, i.e., data that may contain incorrect/corrupted input-output examples. Our algorithm uses an abstraction refinement based optimization process to synthesize programs which optimize the tradeoff between the loss over the noisy dataset and the complexity of the synthesized program. The algorithm uses abstractions to divide the search space of programs into subspaces by computing an abstract value that represents outputs for all programs in a subspace. The abstract value allows our algorithm to compute, for each subspace, a sound approximate lower bound of the loss over all programs in the subspace. It iteratively refines these abstractions to further subdivide the space into smaller subspaces, prune subspaces that do not contain an optimal program, and eventually synthesize an optimal program. We implemented this algorithm in a tool called Rose. We compare Rose to a current state-of-the-art noisy program synthesis system using the SyGuS 2018 benchmark suite. Our evaluation demonstrates that Rose significantly outperforms this previous system: on two noisy benchmark program synthesis problems sets drawn from the SyGus 2018 benchmark suite, Rose delivers speedups of up to 1587 and 81.7, with median speedups of 20.5 and 81.7. Rose also terminates on 20 (out of 54) and 4 (out of 11) more benchmark problems than the previous system. Both Rose and the previous system synthesize programs that are optimal over the provided noisy data sets. For the majority of the problems in the benchmark sets ($272$ out of $286$), the synthesized programs also produce correct outputs for all inputs in the original (unseen) noise-free data set. These results highlight the benefits that Rose can deliver for effective noisy program synthesis.

PLMar 8, 2021
Program Synthesis Over Noisy Data with Guarantees

Shivam Handa, Martin Rinard

We explore and formalize the task of synthesizing programs over noisy data, i.e., data that may contain corrupted input-output examples. By formalizing the concept of a Noise Source, an Input Source, and a prior distribution over programs, we formalize the probabilistic process which constructs a noisy dataset. This formalism allows us to define the correctness of a synthesis algorithm, in terms of its ability to synthesize the hidden underlying program. The probability of a synthesis algorithm being correct depends upon the match between the Noise Source and the Loss Function used in the synthesis algorithm's optimization process. We formalize the concept of an optimal Loss Function given prior information about the Noise Source. We provide a technique to design optimal Loss Functions given perfect and imperfect information about the Noise Sources. We also formalize the concept and conditions required for convergence, i.e., conditions under which the probability that the synthesis algorithm produces a correct program increases as the size of the noisy data set increases. This paper presents the first formalization of the concept of optimal Loss Functions, the first closed form definition of optimal Loss Functions, and the first conditions that ensure that a noisy synthesis algorithm will have convergence guarantees.

AIFeb 22, 2021
Program Synthesis Guided Reinforcement Learning for Partially Observed Environments

Yichen David Yang, Jeevana Priya Inala, Osbert Bastani et al.

A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must implement a strategy that correctly, and ideally optimally, handles every possible configuration of the hidden regions of the environment. We propose a new approach, model predictive program synthesis (MPPS), that uses program synthesis to automatically generate the guiding programs. It trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. In our experiments, we show that our approach significantly outperforms non-program-guided approaches on a set of challenging benchmarks, including a 2D Minecraft-inspired environment where the agent must complete a complex sequence of subtasks to achieve its goal, and achieves a similar performance as using handcrafted programs to guide the agent. Our results demonstrate that our approach can obtain the benefits of program-guided reinforcement learning without requiring the user to provide a new guiding program for every new task.

MAJan 5, 2021
Neurosymbolic Transformers for Multi-Agent Communication

Jeevana Priya Inala, Yichen Yang, James Paulos et al.

We study the problem of inferring communication structures that can solve cooperative multi-agent planning problems while minimizing the amount of communication. We quantify the amount of communication as the maximum degree of the communication graph; this metric captures settings where agents have limited bandwidth. Minimizing communication is challenging due to the combinatorial nature of both the decision space and the objective; for instance, we cannot solve this problem by training neural networks using gradient descent. We propose a novel algorithm that synthesizes a control policy that combines a programmatic communication policy used to generate the communication graph with a transformer policy network used to choose actions. Our algorithm first trains the transformer policy, which implicitly generates a "soft" communication graph; then, it synthesizes a programmatic communication policy that "hardens" this graph, forming a neurosymbolic transformer. Our experiments demonstrate how our approach can synthesize policies that generate low-degree communication graphs while maintaining near-optimal performance.

LGMay 29, 2020
Towards Context-Agnostic Learning Using Synthetic Data

Charles Jin, Martin Rinard

We propose a novel setting for learning, where the input domain is the image of a map defined on the product of two sets, one of which completely determines the labels. We derive a new risk bound for this setting that decomposes into a bias and an error term, and exhibits a surprisingly weak dependence on the true labels. Inspired by these results, we present an algorithm aimed at minimizing the bias term by exploiting the ability to sample from each set independently. We apply our setting to visual classification tasks, where our approach enables us to train classifiers on datasets that consist entirely of a single synthetic example of each class. On several standard benchmarks for real-world image classification, we achieve robust performance in the context-agnostic setting, with good generalization to real world domains, whereas training directly on real world data without our techniques yields classifiers that are brittle to perturbations of the background.

AIMay 7, 2020
Efficient Exact Verification of Binarized Neural Networks

Kai Jia, Martin Rinard

Concerned with the reliability of neural networks, researchers have developed verification techniques to prove their robustness. Most verifiers work with real-valued networks. Unfortunately, the exact (complete and sound) verifiers face scalability challenges and provide no correctness guarantees due to floating point errors. We argue that Binarized Neural Networks (BNNs) provide comparable robustness and allow exact and significantly more efficient verification. We present a new system, EEV, for efficient and exact verification of BNNs. EEV consists of two parts: (i) a novel SAT solver that speeds up BNN verification by natively handling the reified cardinality constraints arising in BNN encodings; and (ii) strategies to train solver-friendly robust BNNs by inducing balanced layer-wise sparsity and low cardinality bounds, and adaptively cancelling the gradients. We demonstrate the effectiveness of EEV by presenting the first exact verification results for L-inf-bounded adversarial robustness of nontrivial convolutional BNNs on the MNIST and CIFAR10 datasets. Compared to exact verification of real-valued networks of the same architectures on the same tasks, EEV verifies BNNs hundreds to thousands of times faster, while delivering comparable verifiable accuracy in most cases.

MLMar 9, 2020
Manifold Regularization for Locally Stable Deep Neural Networks

Charles Jin, Martin Rinard

We apply concepts from manifold regularization to develop new regularization techniques for training locally stable deep neural networks. Our regularizers are based on a sparsification of the graph Laplacian which holds with high probability when the data is sparse in high dimensions, as is common in deep learning. Empirically, our networks exhibit stability in a diverse set of perturbation models, including $\ell_2$, $\ell_\infty$, and Wasserstein-based perturbations; in particular, we achieve 40% adversarial accuracy on CIFAR-10 against an adaptive PGD attack using $\ell_\infty$ perturbations of size $ε= 8/255$, and state-of-the-art verified accuracy of 21% in the same perturbation model. Furthermore, our techniques are efficient, incurring overhead on par with two additional parallel forward passes through the network.

LGMar 6, 2020
Exploiting Verified Neural Networks via Floating Point Numerical Error

Kai Jia, Martin Rinard

Researchers have developed neural network verification algorithms motivated by the need to characterize the robustness of deep neural networks. The verifiers aspire to answer whether a neural network guarantees certain properties with respect to all inputs in a space. However, many verifiers inaccurately model floating point arithmetic but do not thoroughly discuss the consequences. We show that the negligence of floating point error leads to unsound verification that can be systematically exploited in practice. For a pretrained neural network, we present a method that efficiently searches inputs as witnesses for the incorrectness of robustness claims made by a complete verifier. We also present a method to construct neural network architectures and weights that induce wrong results of an incomplete verifier. Our results highlight that, to achieve practically reliable verification of neural networks, any verification system must accurately (or conservatively) model the effects of any floating point computations in the network inference or verification system.

SEJul 15, 2019
Characterizing Developer Use of Automatically Generated Patches

José Pablo Cambronero, Jiasi Shen, Jürgen Cito et al.

We present a study that characterizes the way developers use automatically generated patches when fixing software defects. Our study tasked two groups of developers with repairing defects in C programs. Both groups were provided with the defective line of code. One was also provided with five automatically generated and validated patches, all of which modified the defective line of code, and one of which was correct. Contrary to our initial expectations, the group with access to the generated patches did not produce more correct patches and did not produce patches in less time. We characterize the main behaviors observed in experimental subjects: a focus on understanding the defect and the relationship of the patches to the original source code. Based on this characterization, we highlight various potentially productive directions for future developer-centric automatic patch generation systems.

LGJun 3, 2019
Correctness Verification of Neural Networks

Yichen Yang, Martin Rinard

We present a novel framework for specifying and verifying correctness globally for neural networks on perception tasks. Most previous works on neural network verification for perception tasks focus on robustness verification. Unlike robustness verification, which aims to verify that the prediction of a network is stable in some local regions around labelled points, our framework provides a way to specify correctness globally in the whole target input space and verify that the network is correct for all target inputs (or find the regions where the network is not correct). We provide a specification through 1) a state space consisting of all relevant states of the world and 2) an observation process that produces neural network inputs from the states of the world. Tiling the state and input spaces with a finite number of tiles, obtaining ground truth bounds from the state tiles and network output bounds from the input tiles, then comparing the ground truth and network output bounds delivers an upper bound on the network output error for any inputs of interest. The presented framework also enables detecting illegal inputs -- inputs that are not contained in (or close to) the target input space as defined by the state space and observation process (the neural network is not designed to work on them), so that we can flag when we don't have guarantees. Results from two case studies highlight the ability of our technique to verify error bounds over the whole target input space and show how the error bounds vary over the state and input spaces.

APP-PHApr 6, 2018
A Hardware Platform for Efficient Multi-Modal Sensing with Adaptive Approximation

Phillip Stanley-Marbell, Martin Rinard

We present Warp, a hardware platform to support research in approximate computing, sensor energy optimization, and energy-scavenged systems. Warp incorporates 11 state-of-the-art sensor integrated circuits, computation, and an energy-scavenged power supply, all within a miniature system that is just 3.6 cm x 3.3 cm x 0.5 cm. Warp's sensor integrated circuits together contain a total of 21 sensors with a range of precisions and accuracies for measuring eight sensing modalities of acceleration, angular rate, magnetic flux density (compass heading), humidity, atmospheric pressure (elevation), infrared radiation, ambient temperature, and color. Warp uses a combination of analog circuits and digital control to facilitate further tradeoffs between sensor and communication accuracy, energy efficiency, and performance. This article presents the design of Warp and presents an evaluation of our hardware implementation. The results show how Warp's design enables performance and energy efficiency versus ac- curacy tradeoffs.

HCMar 22, 2018
Incremental Color Quantization for Color-Vision-Deficient Observers Using Mobile Gaming Data

Jose Cambronero, Phillip Stanley-Marbell, Martin Rinard

The sizes of compressed images depend on their spatial resolution (number of pixels) and on their color resolution (number of color quantization levels). We introduce DaltonQuant, a new color quantization technique for image compression that cloud services can apply to images destined for a specific user with known color vision deficiencies. DaltonQuant improves compression in a user-specific but reversible manner thereby improving a user's network bandwidth and data storage efficiency. DaltonQuant quantizes image data to account for user-specific color perception anomalies, using a new method for incremental color quantization based on a large corpus of color vision acuity data obtained from a popular mobile game. Servers that host images can revert DaltonQuant's image requantization and compression when those images must be transmitted to a different user, making the technique practical to deploy on a large scale. We evaluate DaltonQuant's compression performance on the Kodak PC reference image set and show that it improves compression by an additional 22%-29% over the state-of-the-art compressors TinyPNG and pngquant.

AIMar 20, 2018
The Three Pillars of Machine Programming

Justin Gottschlich, Armando Solar-Lezama, Nesime Tatbul et al.

In this position paper, we describe our vision of the future of machine programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and(iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.

LGJun 20, 2016
Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings

Fereshte Khani, Martin Rinard, Percy Liang

Can we train a system that, on any new input, either says "don't know" or makes a prediction that is guaranteed to be correct? We answer the question in the affirmative provided our model family is well-specified. Specifically, we introduce the unanimity principle: only predict when all models consistent with the training data predict the same output. We operationalize this principle for semantic parsing, the task of mapping utterances to logical forms. We develop a simple, efficient method that reasons over the infinite set of all consistent models by only checking two of the models. We prove that our method obtains 100% precision even with a modest amount of training data from a possibly adversarial distribution. Empirically, we demonstrate the effectiveness of our approach on the standard GeoQuery dataset.

SEFeb 18, 2016
An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems

Fan Long, Martin Rinard

We present the first systematic analysis of the characteristics of patch search spaces for automatic patch generation systems. We analyze the search spaces of two current state-of-the-art systems, SPR and Prophet, with 16 different search space configurations. Our results are derived from an analysis of 1104 different search spaces and 768 patch generation executions. Together these experiments consumed over 9000 hours of CPU time on Amazon EC2. The analysis shows that 1) correct patches are sparse in the search spaces (typically at most one correct patch per search space per defect), 2) incorrect patches that nevertheless pass all of the test cases in the validation test suite are typically orders of magnitude more abundant, and 3) leveraging information other than the test suite is therefore critical for enabling the system to successfully isolate correct patches. We also characterize a key tradeoff in the structure of the search spaces. Larger and richer search spaces that contain correct patches for more defects can actually cause systems to find fewer, not more, correct patches. We identify two reasons for this phenomenon: 1) increased validation times because of the presence of more candidate patches and 2) more incorrect patches that pass the test suite and block the discovery of correct patches. These fundamental properties, which are all characterized for the first time in this paper, help explain why past systems often fail to generate correct patches and help identify challenges, opportunities, and productive future directions for the field.

SEFeb 2, 2012
Cryptographic Path Hardening: Hiding Vulnerabilities in Software through Cryptography

Vijay Ganesh, Michael Carbin, Martin Rinard

We propose a novel approach to improving software security called Cryptographic Path Hardening, which is aimed at hiding security vulnerabilities in software from attackers through the use of provably secure and obfuscated cryptographic devices to harden paths in programs. By "harden" we mean that certain error-checking if-conditionals in a given program P are replaced by equivalent" we mean that adversaries cannot use semi-automatic program analysis techniques to reason about the hardened program paths and thus cannot discover as-yet-unknown errors along those paths, except perhaps through black-box dictionary attacks or random testing (which we can never prevent). Other than these unpreventable attack methods, we can make program analysis aimed at error-finding "provably hard" for a resource-bounded attacker, in the same sense that cryptographic schemes are hard to break. Unlike security-through-obscurity, in Cryptographic Path Hardening we use provably-secure crypto devices to hide errors and our mathematical arguments of security are the same as the standard ones used in cryptography. One application of Cryptographic Path Hardening is that software patches or filters often reveal enough information to an attacker that they can be used to construct error-revealing inputs to exploit an unpatched version of the program. By "hardening" the patch we make it difficult for the attacker to analyze the patched program to construct error-revealing inputs, and thus prevent him from potentially constructing exploits.