SEOct 23, 2019

Empirical Review of Automated Analysis Tools on 47,587 Ethereum Smart Contracts

arXiv:1910.10601v2475 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This study addresses the problem of comparing and reproducing research on smart contract analysis for developers and researchers, though it is incremental as it builds on existing tools with new datasets.

The authors conducted an empirical evaluation of 9 automated analysis tools on Ethereum smart contracts, using a dataset of 47,587 contracts, and found that only 42% of vulnerabilities were detected by all tools, with the best tool achieving 27% accuracy, and 97% of contracts were flagged as vulnerable, indicating high false positives.

Over the last few years, there has been substantial research on automated analysis, testing, and debugging of Ethereum smart contracts. However, it is not trivial to compare and reproduce that research. To address this, we present an empirical evaluation of 9 state-of-the-art automated analysis tools using two new datasets: i) a dataset of 69 annotated vulnerable smart contracts that can be used to evaluate the precision of analysis tools; and ii) a dataset with all the smart contracts in the Ethereum Blockchain that have Solidity source code available on Etherscan (a total of 47,518 contracts). The datasets are part of SmartBugs, a new extendable execution framework that we created to facilitate the integration and comparison between multiple analysis tools and the analysis of Ethereum smart contracts. We used SmartBugs to execute the 9 automated analysis tools on the two datasets. In total, we ran 428,337 analyses that took approximately 564 days and 3 hours, being the largest experimental setup to date both in the number of tools and in execution time. We found that only 42% of the vulnerabilities from our annotated dataset are detected by all the tools, with the tool Mythril having the higher accuracy (27%). When considering the largest dataset, we observed that 97% of contracts are tagged as vulnerable, thus suggesting a considerable number of false positives. Indeed, only a small number of vulnerabilities (and of only two categories) were detected simultaneously by four or more tools.

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes