SEAug 16, 2022
Machine Learning-Based Test Smell DetectionValeria Pontillo, Dario Amoroso d'Aragona, Fabiano Pecorelli et al.
Context: Test smells are symptoms of sub-optimal design choices adopted when developing test cases. Previous studies have proved their harmfulness for test code maintainability and effectiveness. Therefore, researchers have been proposing automated, heuristic-based techniques to detect them. However, the performance of such detectors is still limited and dependent on thresholds to be tuned. Objective: We propose the design and experimentation of a novel test smell detection approach based on machine learning to detect four test smells. Method: We plan to develop the largest dataset of manually-validated test smells. This dataset will be leveraged to train six machine learners and assess their capabilities in within- and cross-project scenarios. Finally, we plan to compare our approach with state-of-the-art heuristic-based techniques.
41.1SEMar 24Code
MuSe: a Mutation Testing Plugin for the Remix IDEGerardo Iuliano, Daniele Carangelo, Carmine Calabrese et al.
Mutation testing is a technique to assess the effectiveness of test suites by introducing artificial faults into programs. Although mutation testing plugins are available for many platforms and languages, none is currently available for Remix-IDE, the most widely used Integrated Development Environment for the entire contract development journey, used by users of all knowledge levels, and serves as a learning lab for teaching and experimenting with Ethereum. The quality and security of smart contracts are crucial in blockchain systems, as even minor issues can result in substantial financial losses. This paper proposes MuSe, a mutation testing plugin for the Remix-IDE. MuSe includes traditional, Solidity-specific, and security-oriented mutation operators. Its integration into the Remix-IDE eliminates the need for additional setup and lowers the entry barrier. As a result, developers and researchers can immediately leverage mutation testing to assess the effectiveness of their test suites and identify potential issues in smart contracts. We provide a demo video showing MuSe: https://www.youtube.com/watch?v=MIFk9exTDu0 and its repository: https://github.com/GerardoIuliano/MuSe-Remix-Plugin.
SEJan 14
Smart Contract Vulnerabilities, Tools, and Benchmarks: an Updated Systematic Literature ReviewGerardo Iuliano, Dario Di Nucci
Smart contracts are self-executing programs on blockchain platforms like Ethereum, which have revolutionized decentralized finance by enabling trustless transactions and the operation of decentralized applications. Despite their potential, the security of smart contracts remains a critical concern due to their immutability and transparency, which expose them to malicious actors. Numerous solutions for vulnerability detection have been proposed, but it is still unclear which one is the most effective. This paper presents a systematic literature review that explores vulnerabilities in Ethereum smart contracts, focusing on automated detection tools and benchmark evaluation. We reviewed 3,380 studies from five digital libraries and five major software engineering conferences, applying a structured selection process that resulted in 222 high-quality studies. The key results include a hierarchical taxonomy of 192 vulnerabilities grouped into 13 categories, a comprehensive list of 219 detection tools with corresponding functionalities, methods, and code transformation techniques, a mapping between our taxonomy and the list of tools, and a collection of 133 benchmarks used for tool evaluation. We conclude with a discussion about the insights into the current state of Ethereum smart contract security and directions for future research.
SENov 23, 2021Code
RepoMiner: a Language-agnostic Python Framework to Mine Software Repositories for Defect PredictionStefano Dalla Palma, Dario Di Nucci, Damian Tamburri
Data originating from open-source software projects provide valuable information to enhance software quality. In the scope of Software Defect Prediction, one of the most challenging parts is extracting valid data about failure-prone software components from these repositories, which can help develop more robust software. In particular, collecting data, calculating metrics, and synthesizing results from these repositories is a tedious and error-prone task, which often requires understanding the programming languages involved in the mined repositories, eventually leading to a proliferation of language-specific data-mining software. This paper presents RepoMiner, a language-agnostic tool developed to support software engineering researchers in creating datasets to support any study on defect prediction. RepoMiner automatically collects failure data from software components, labels them as failure-prone or neutral, and calculates metrics to be used as ground truth for defect prediction models. We present its implementation and provide examples of its application.
SEMay 7, 2021Code
Detecting Security Fixes in Open-Source Repositories using Static Code AnalyzersTherese Fehrer, Rocío Cabrera Lozoya, Antonino Sabetta et al.
The sources of reliable, code-level information about vulnerabilities that affect open-source software (OSS) are scarce, which hinders a broad adoption of advanced tools that provide code-level detection and assessment of vulnerable OSS dependencies. In this paper, we study the extent to which the output of off-the-shelf static code analyzers can be used as a source of features to represent commits in Machine Learning (ML) applications. In particular, we investigate how such features can be used to construct embeddings and train ML models to automatically identify source code commits that contain vulnerability fixes. We analyze such embeddings for security-relevant and non-security-relevant commits, and we show that, although in isolation they are not different in a statistically significant manner, it is possible to use them to construct a ML pipeline that achieves results comparable with the state of the art. We also found that the combination of our method with commit2vec represents a tangible improvement over the state of the art in the automatic identification of commits that fix vulnerabilities: the ML models we construct and commit2vec are complementary, the former being more generally applicable, albeit not as accurate.
SESep 22, 2020Code
DeepIaC: Deep Learning-Based Linguistic Anti-pattern Detection in IaCNemania Borovits, Indika Kumara, Parvathy Krishnan et al.
Linguistic anti-patterns are recurring poor practices concerning inconsistencies among the naming, documentation, and implementation of an entity. They impede readability, understandability, and maintainability of source code. This paper attempts to detect linguistic anti-patterns in infrastructure as code (IaC) scripts used to provision and manage computing environments. In particular, we consider inconsistencies between the logic/body of IaC code units and their names. To this end, we propose a novel automated approach that employs word embeddings and deep learning techniques. We build and use the abstract syntax tree of IaC code units to create their code embedments. Our experiments with a dataset systematically extracted from open source repositories show that our approach yields an accuracy between0.785and0.915in detecting inconsistencies
SEMar 24, 2021
Automated Mapping of Vulnerability Advisories onto their Fix Commits in Open Source RepositoriesDaan Hommersom, Antonino Sabetta, Bonaventura Coppola et al.
The lack of comprehensive sources of accurate vulnerability data represents a critical obstacle to studying and understanding software vulnerabilities (and their corrections). In this paper, we present an approach that combines heuristics stemming from practical experience and machine-learning (ML) - specifically, natural language processing (NLP) - to address this problem. Our method consists of three phases. First, an advisory record containing key information about a vulnerability is extracted from an advisory (expressed in natural language). Second, using heuristics, a subset of candidate fix commits is obtained from the source code repository of the affected project by filtering out commits that are known to be irrelevant for the task at hand. Finally, for each such candidate commit, our method builds a numerical feature vector reflecting the characteristics of the commit that are relevant to predicting its match with the advisory at hand. The feature vectors are then exploited for building a final ranked list of candidate fixing commits. The score attributed by the ML model to each feature is kept visible to the users, allowing them to interpret the predictions. We evaluated our approach using a prototype implementation named FixFinder on a manually curated data set that comprises 2,391 known fix commits corresponding to 1,248 public vulnerability advisories. When considering the top-10 commits in the ranked results, our implementation could successfully identify at least one fix commit for up to 84.03% of the vulnerabilities (with a fix commit on the first position for 65.06% of the vulnerabilities). In conclusion, our method reduces considerably the effort needed to search OSS repositories for the commits that fix known vulnerabilities.
SEFeb 17, 2021
Automated Test-Case Generation for Solidity Smart Contracts: the AGSolT Approach and its EvaluationStefan Driessen, Dario Di Nucci, Geert Monsieur et al.
Blockchain and smart contract technology are novel approaches to data and code management that facilitate trusted computing by allowing for development in a distributed and decentralized manner. Testing smart contracts comes with its own set of challenges which have not yet been fully identified and explored. Although existing tools can identify and discover known vulnerabilities and their interactions on the Ethereum blockchain through random search or symbolic execution, these tools generally do not produce test suites suitable for human oracles. In this paper, we present AGSOLT (Automated Generator of Solidity Test Suites). We demonstrate its efficiency by implementing two search algorithms to automatically generate test suites for stand-alone Solidity smart contracts, taking into account some of the blockchain-specific challenges. To test AGSOLT, we compared a random search algorithm and a genetic algorithm on a set of 36 real-world smart contracts. We found that AGSOLT is capable of achieving high branch coverage with both approaches and even discovered some errors in some of the most popular Solidity smart contracts on Github.
SEOct 30, 2020
Prioritising Server Side Reachability via Inter-process Concolic TestingMaarten Vandercammen, Laurent Christophe, Dario Di Nucci et al.
Context: Most approaches to automated white-box testing consider the client side and the server side of a web application in isolation from each other. Such testers lack a whole-program perspective on the web application under test. Inquiry: We hypothesise that an additional whole-program perspective would enable the tester to discover which server side errors can be triggered by an actual end user accessing the application through the client, and which ones can only be triggered in hypothetical scenarios. Approach: In this paper, we explore the idea of employing such a whole-program perspective in testing. To this end, we develop , a novel concolic tester which operates on full-stack JavaScript web applications, where both the client and the server side are JavaScript processes communicating via asynchronous messages -- as enabled by the WebSocket or Socket.IO-libraries. Knowledge: We find that the whole-program perspective enables discerning high-priority errors, which are reachable from a particular client, from low-priority errors, which are not accessible through the tested client. Another benefit of the perspective is that it allows the automated tester to construct practical, step-by-step scenarios for triggering server side errors from the end user's perspective. Grounding: We apply on a collection of web applications to evaluate how effective testing is in distinguishing between high- and low-priority errors. The results show that correctly classifies the majority of server errors. Importance: This paper demonstrates the feasibility of testing as a novel approach for automatically testing web applications. Classifying errors as being of high or low importance aids developers in prioritising bugs that might be encountered by users, and postponing the diagnosis of bugs that are less easily reached.
SEMay 27, 2020
Towards a Catalogue of Software Quality Metrics for Infrastructure CodeStefano Dalla Palma, Dario Di Nucci, Fabio Palomba et al.
Infrastructure-as-code (IaC) is a practice to implement continuous deployment by allowing management and provisioning of infrastructure through the definition of machine-readable files and automation around them, rather than physical hardware configuration or interactive configuration tools. On the one hand, although IaC represents an ever-increasing widely adopted practice nowadays, still little is known concerning how to best maintain, speedily evolve, and continuously improve the code behind the IaC practice in a measurable fashion. On the other hand, source code measurements are often computed and analyzed to evaluate the different quality aspects of the software developed. However, unlike general-purpose programming languages (GPLs), IaC scripts use domain-specific languages, and metrics used for GPLs may not be applicable for IaC scripts. This article proposes a catalogue consisting of 46 metrics to identify IaC properties focusing on Ansible, one of the most popular IaC language to date, and shows how they can be used to analyze IaC scripts.