SEApr 30
The Ultimate Configuration Management Tool? Lessons from a Mixed Methods Study of Ansible's ChallengesCarolina Carreira, Nuno Saavedra, Alexandra Mendes et al.
Infrastructure as Code (IaC) tools have transformed the way IT infrastructure is automated and managed, but their growing adoption has also exposed numerous challenges for practitioners. In this paper, we investigate these challenges through the lens of Ansible, a popular IaC tool. Using a mixed methods approach, we investigate challenges faced by practitioners. We analyze 59,157 posts from Stack Overflow, Reddit, and the Ansible Forum to identify common pain points, complemented by 20 semi-structured interviews with practitioners of varying expertise levels. Based on our findings, we highlight key directions for improving Ansible, with implications for other IaC technologies, including stronger failure locality to support debugging, clearer separation of language and templating boundaries, targeted documentation, and improved execution backends to address performance issues. By grounding these insights in the real-world struggles of Ansible users, this study provides actionable guidance for tool designers and for the broader IaC community, and contributes to a deeper understanding of the trade-offs inherent in IaC tools.
SEDec 18, 2024Code
Rango: Adaptive Retrieval-Augmented Proving for Automated Software VerificationKyle Thompson, Nuno Saavedra, Pedro Carrott et al.
Formal verification using proof assistants, such as Coq, enables the creation of high-quality software. However, the verification process requires significant expertise and manual effort to write proofs. Recent work has explored automating proof synthesis using machine learning and large language models (LLMs). This work has shown that identifying relevant premises, such as lemmas and definitions, can aid synthesis. We present Rango, a fully automated proof synthesis tool for Coq that automatically identifies relevant premises and also similar proofs from the current project and uses them during synthesis. Rango uses retrieval augmentation at every step of the proof to automatically determine which proofs and premises to include in the context of its fine-tuned LLM. In this way, Rango adapts to the project and to the evolving state of the proof. We create a new dataset, CoqStoq, of 2,226 open-source Coq projects and 196,929 theorems from GitHub, which includes both training data and a curated evaluation benchmark of well-maintained projects. On this benchmark, Rango synthesizes proofs for 32.0% of the theorems, which is 29% more theorems than the prior state-of-the-art tool Tactician. Our evaluation also shows that Rango adding relevant proofs to its context leads to a 47% increase in the number of theorems proven.
SEOct 23, 2019Code
Empirical Review of Automated Analysis Tools on 47,587 Ethereum Smart ContractsThomas Durieux, João F. Ferreira, Rui Abreu et al.
Over the last few years, there has been substantial research on automated analysis, testing, and debugging of Ethereum smart contracts. However, it is not trivial to compare and reproduce that research. To address this, we present an empirical evaluation of 9 state-of-the-art automated analysis tools using two new datasets: i) a dataset of 69 annotated vulnerable smart contracts that can be used to evaluate the precision of analysis tools; and ii) a dataset with all the smart contracts in the Ethereum Blockchain that have Solidity source code available on Etherscan (a total of 47,518 contracts). The datasets are part of SmartBugs, a new extendable execution framework that we created to facilitate the integration and comparison between multiple analysis tools and the analysis of Ethereum smart contracts. We used SmartBugs to execute the 9 automated analysis tools on the two datasets. In total, we ran 428,337 analyses that took approximately 564 days and 3 hours, being the largest experimental setup to date both in the number of tools and in execution time. We found that only 42% of the vulnerabilities from our annotated dataset are detected by all the tools, with the tool Mythril having the higher accuracy (27%). When considering the largest dataset, we observed that 97% of contracts are tagged as vulnerable, thus suggesting a considerable number of false positives. Indeed, only a small number of vulnerabilities (and of only two categories) were detected simultaneously by four or more tools.
HCNov 16, 2021
Exploring Usable Security to Improve the Impact of Formal Verification: A Research AgendaCarolina Carreira, João F. Ferreira, Alexandra Mendes et al.
As software becomes more complex and assumes an even greater role in our lives, formal verification is set to become the gold standard in securing software systems into the future, since it can guarantee the absence of errors and entire classes of attack. Recent advances in formal verification are being used to secure everything from unmanned drones to the internet. At the same time, the usable security research community has made huge progress in improving the usability of security products and end-users comprehension of security issues. However, there have been no human-centered studies focused on the impact of formal verification on the use and adoption of formally verified software products. We propose a research agenda to fill this gap and to contribute with the first collection of studies on people's mental models on formal verification and associated security and privacy guarantees and threats. The proposed research has the potential to increase the adoption of more secure products and it can be directly used by the security and formal methods communities to create more effective and secure software tools.
CRJun 7, 2021
Towards Formal Verification of Password Generation Algorithms used in Password ManagersMiguel Grilo, João F. Ferreira, José Bacelar Almeida
Password managers are important tools that enable us to use stronger passwords, freeing us from the cognitive burden of remembering them. Despite this, there are still many users who do not fully trust password managers. In this paper, we focus on a feature that most password managers offer that might impact the user's trust, which is the process of generating a random password. We survey which algorithms are most commonly used and we propose a solution for a formally verified reference implementation of a password generation algorithm. We use EasyCrypt as our framework to both specify the reference implementation and to prove its functional correctness and security.
SEJul 8, 2020
SmartBugs: A Framework to Analyze Solidity Smart ContractsJoão F. Ferreira, Pedro Cruz, Thomas Durieux et al.
Over the last few years, there has been substantial research on automated analysis, testing, and debugging of Ethereum smart contracts. However, it is not trivial to compare and reproduce that research. To address this, we present SmartBugs, an extensible and easy-to-use execution framework that simplifies the execution of analysis tools on smart contracts written in Solidity, the primary language used in Ethereum. SmartBugs is currently distributed with support for 10 tools and two datasets of Solidity contracts. The first dataset can be used to evaluate the precision of analysis tools, as it contains 143 annotated vulnerable contracts with 208 tagged vulnerabilities. The second dataset contains 47,518 unique contracts collected through Etherscan. We discuss how SmartBugs supported the largest experimental setup to date both in the number of tools and in execution time. Moreover, we show how it enables easy integration and comparison of analysis tools by presenting a new extension to the tool SmartCheck that improves substantially the detection of vulnerabilities related to the DASP10 categories Bad Randomness, Time Manipulation, and Access Control (identified vulnerabilities increased from 11% to 24%).
CRJul 7, 2020
Skeptic: Automatic, Justified and Privacy-Preserving Password Composition Policy SelectionSaul Johnson, João F. Ferreira, Alexandra Mendes et al.
The choice of password composition policy to enforce on a password-protected system represents a critical security decision, and has been shown to significantly affect the vulnerability of user-chosen passwords to guessing attacks. In practice, however, this choice is not usually rigorous or justifiable, with a tendency for system administrators to choose password composition policies based on intuition alone. In this work, we propose a novel methodology that draws on password probability distributions constructed from large sets of real-world password data which have been filtered according to various password composition policies. Password probabilities are then redistributed to simulate different user password reselection behaviours in order to automatically determine the password composition policy that will induce the distribution of user-chosen passwords with the greatest uniformity, a metric which we show to be a useful proxy to measure overall resistance to password guessing attacks. Further, we show that by fitting power-law equations to the password probability distributions we generate, we can justify our choice of password composition policy without any direct access to user password data. Finally, we present Skeptic -- a software toolkit that implements this methodology, including a DSL to enable system administrators with no background in password security to compare and rank password composition policies without resorting to expensive and time-consuming user studies. Drawing on 205,176,321 pass words across 3 datasets, we lend validity to our approach by demonstrating that the results we obtain align closely with findings from a previous empirical study into password composition policy effectiveness.