Alex Groce

SE
10papers
1,368citations
Novelty31%
AI Score25

10 Papers

SENov 18, 2019Code
What are the Actual Flaws in Important Smart Contracts (and How Can We Find Them)?

Alex Groce, Josselin Feist, Gustavo Grieco et al.

An important problem in smart contract security is understanding the likelihood and criticality of discovered, or potential, weaknesses in contracts. In this paper we provide a summary of Ethereum smart contract audits performed for 23 professional stakeholders, avoiding the common problem of reporting issues mostly prevalent in low-quality contracts. These audits were performed at a leading company in blockchain security, using both open-source and proprietary tools, as well as human code analysis performed by professional security engineers. We categorize 246 individual defects, making it possible to compare the severity and frequency of different vulnerability types, compare smart contract and non-smart contract flaws, and to estimate the efficacy of automated vulnerability detection approaches.

SEJul 8, 2019Code
Manticore: A User-Friendly Symbolic Execution Framework for Binaries and Smart Contracts

Mark Mossberg, Felipe Manzano, Eric Hennenfent et al.

An effective way to maximize code coverage in software tests is through dynamic symbolic execution$-$a technique that uses constraint solving to systematically explore a program's state space. We introduce an open-source dynamic symbolic execution framework called Manticore for analyzing binaries and Ethereum smart contracts. Manticore's flexible architecture allows it to support both traditional and exotic execution environments, and its API allows users to customize their analysis. Here, we discuss Manticore's architecture and demonstrate the capabilities we have used to find bugs and verify the correctness of code for our commercial clients.

SEJan 27, 2022
Mutation Analysis: Answering the Fuzzing Challenge

Rahul Gopinath, Philipp Görz, Alex Groce

Fuzzing is one of the fastest growing fields in software testing. The idea behind fuzzing is to check the behavior of software against a large number of randomly generated inputs, trying to cover all interesting parts of the input space, while observing the tested software for anomalous behaviour. One of the biggest challenges facing fuzzer users is how to validate software behavior, and how to improve the quality of oracles used. While mutation analysis is the premier technique for evaluating the quality of software test oracles, mutation score is rarely used as a metric for evaluating fuzzer quality. Unless mutation analysis researchers can solve multiple problems that make applying mutation analysis to fuzzing challenging, mutation analysis may be permanently sidelined in one of the most important areas of testing and security research. This paper attempts to understand the main challenges in applying mutation analysis for evaluating fuzzers, so that researchers can focus on solving these challenges.

SEMar 11, 2021
Using Relative Lines of Code to Guide Automated Test Generation for Python

Josie Holmes, Iftekhar Ahmed, Caius Brindescu et al.

Raw lines of code (LOC) is a metric that does not, at first glance, seem extremely useful for automated test generation. It is both highly language-dependent and not extremely meaningful, semantically, within a language: one coder can produce the same effect with many fewer lines than another. However, relative LOC, between components of the same project, turns out to be a highly useful metric for automated testing. In this paper, we make use of a heuristic based on LOC counts for tested functions to dramatically improve the effectiveness of automated test generation. This approach is particularly valuable in languages where collecting code coverage data to guide testing has a very high overhead.We apply the heuristic to property-based Python testing using the TSTL (Template Scripting Testing Language) tool. In our experiments, the simple LOC heuristic can improve branch and statement coverage by large margins (often more than 20%, up to 40% or more), and improve fault detection by an even larger margin (usually more than 75%, and up to 400% or more). The LOC heuristic is also easy to combine with other approaches, and is comparable to, and possibly more effective than, two well-established approaches for guiding random testing.

SEAug 26, 2019
Slither: A Static Analysis Framework For Smart Contracts

Josselin Feist, Gustavo Grieco, Alex Groce

This paper describes Slither, a static analysis framework designed to provide rich information about Ethereum smart contracts. It works by converting Solidity smart contracts into an intermediate representation called SlithIR. SlithIR uses Static Single Assignment (SSA) form and a reduced instruction set to ease implementation of analyses while preserving semantic information that would be lost in transforming Solidity to bytecode. Slither allows for the application of commonly used program analysis techniques like dataflow and taint tracking. Our framework has four main use cases: (1) automated detection of vulnerabilities, (2) automated detection of code optimization opportunities, (3) improvement of the user's understanding of the contracts, and (4) assistance with code review. In this paper, we present an overview of Slither, detail the design of its intermediate representation, and evaluate its capabilities on real-world contracts. We show that Slither's bug detection is fast, accurate, and outperforms other static analysis tools at finding issues in Ethereum smart contracts in terms of speed, robustness, and balance of detection and false positives. We compared tools using a large dataset of smart contracts and manually reviewed results for 1000 of the most used contracts.

MLNov 5, 2017
Provenance and Pseudo-Provenance for Seeded Learning-Based Automated Test Generation

Alex Groce, Josie Holmes

Many methods for automated software test generation, including some that explicitly use machine learning (and some that use ML more broadly conceived) derive new tests from existing tests (often referred to as seeds). Often, the seed tests from which new tests are derived are manually constructed, or at least simpler than the tests that are produced as the final outputs of such test generators. We propose annotation of generated tests with a provenance (trail) showing how individual generated tests of interest (especially failing tests) derive from seed tests, and how the population of generated tests relates to the original seed tests. In some cases, post-processing of generated tests can invalidate provenance information, in which case we also propose a method for attempting to construct "pseudo-provenance" describing how the tests could have been (partly) generated from seeds.

LOOct 8, 2017
Proceedings 2nd International Workshop on Causal Reasoning for Embedded and safety-critical Systems Technologies

Alex Groce, Stefan Leue

The second international CREST workshop continued the focus of the first CREST workshop: addressing approaches to causal reasoning in engineering complex embedded and safety-critical systems. Relevant approaches to causal reasoning have been (usually independently) proposed by a variety of communities: AI, concurrency, model-based diagnosis, software engineering, security engineering, and formal methods. The goal of CREST is to bring together researchers and practitioners from these communities to exchange ideas, especially between communities, in order to advance the science of determining root cause(s) for failures of critical systems. The growing complexity of failures such as power grid blackouts, airplane crashes, security and privacy violations, and malfunctioning medical devices or automotive systems makes the goals of CREST more relevant than ever before.

SENov 4, 2016
Data Poisoning: Lightweight Soft Fault Injection for Python

Mohammad Amin Alipour, Alex Groce

This paper introduces and explores the idea of data poisoning, a light-weight peer-architecture technique to inject faults into Python programs. This method requires very small modification to the original program, which facilitates evaluation of sensitivity of systems that are prototyped or modeled in Python. We propose different fault scenarios that can be injected to programs using data poisoning. We use Dijkstra's Self Stabilizing Ring Algorithm to illustrate the approach.

SESep 20, 2016
Finding Model-Checkable Needles in Large Source Code Haystacks: Modular Bug-Finding via Static Analysis and Dynamic Invariant Discovery

Mohammad Amin Alipour, Alex Groce, Chaoqiang Zhang et al.

In this paper, we present a novel marriage of static and dynamic analysis. Given a large code base with many functions and a mature test suite, we propose using static analysis to find functions 1) with assertions or other evident correctness properties (e.g., array bounds requirements or pointer access) and 2) with simple enough control flow and data use to be amenable to predicate-abstraction based or bounded model checking without human intervention. Because most such functions in realistic software systems in fact rely on many input preconditions not specified by the language's type system (or annotated in any way), we propose using dynamically discovered invariants based on a program's test suite to characterize likely preconditions, in order to reduce the problem of false positives. While providing little in the way of verification, this approach may provide an additional quick and highly scalable bug-finding method for programs that are usually considered "too large to model check." We present a simple example showing that the technique can be useful for a more typically "model-checkable" code base, even in the presence of a poorly designed test suite and bad invariants.

SESep 20, 2016
Bounded Model Checking and Feature Omission Diversity

Mohammad Amin Alipour, Alex Groce

In this paper we introduce a novel way to speed up the discovery of counterexamples in bounded model checking, based on parallel runs over versions of a system in which features have been randomly disabled. As shown in previous work, adding constraints to a bounded model checking problem can reduce the size of the verification problem and dramatically decrease the time required to find counterexample. Adapting a technique developed in software testing to this problem provides a simple way to produce useful partial verification problems, with a resulting decrease in average time until a counterexample is produced. If no counterexample is found, partial verification results can also be useful in practice.