44.0SEApr 12
Improving Dynamic Specification Inference with LLM-Generated CounterexamplesAgustín Balestra, Agustín Nolasco, Facundo Molina et al.
Contract assertions, such as preconditions, postconditions, and invariants, play a crucial role in software development, enabling applications such as program verification, test generation, and debugging. Despite their benefits, the adoption of contract assertions is limited, due to the difficulty of manually producing such assertions. Dynamic analysis-based approaches, such as Daikon, can aid in this task by inferring expressive assertions from execution traces. However, a fundamental weakness of these methods is their reliance on the thoroughness of the test suites used for dynamic analysis. When these test suites do not contain sufficiently diverse tests, the inferred assertions are often not generalizable, leading to a high rate of invalid candidates (false positives) that must be manually filtered out. In this paper, we explore the use of large language models (LLMs) to automatically generate tests that attempt to invalidate generated assertions. Our results show that state-of-the-art LLMs can generate effective counterexamples that help to discard up to 11.68\% of invalid assertions inferred by SpecFuzzer. Moreover, when incorporating these LLM-generated counterexamples into the dynamic analysis process, we observe an improvement of up to 7\% in precision of the inferred specifications, with respect to the ground-truths gathered from the evaluation benchmarks, without affecting recall.
SEFeb 26, 2021Code
EvoSpex: An Evolutionary Algorithm for Learning PostconditionsFacundo Molina, Pablo Ponzio, Nazareno Aguirre et al.
Software reliability is a primary concern in the construction of software, and thus a fundamental component in the definition of software quality. Analyzing software reliability requires a specification of the intended behavior of the software under analysis, and at the source code level, such specifications typically take the form of assertions. Unfortunately, software many times lacks such specifications, or only provides them for scenario-specific behaviors, as assertions accompanying tests. This issue seriously diminishes the analyzability of software with respect to its reliability. In this paper, we tackle this problem by proposing a technique that, given a Java method, automatically produces a specification of the method's current behavior, in the form of postcondition assertions. This mechanism is based on generating executions of the method under analysis to obtain valid pre/post state pairs, mutating these pairs to obtain (allegedly) invalid ones, and then using a genetic algorithm to produce an assertion that is satisfied by the valid pre/post pairs, while leaving out the invalid ones. The technique, which targets in particular methods of reference-based class implementations, is assessed on a benchmark of open source Java projects, showing that our genetic algorithm is able to generate post-conditions that are stronger and more accurate, than those generated by related automated approaches, as evaluated by an automated oracle assessment tool. Moreover, our technique is also able to infer an important part of manually written rich postconditions in verified classes, and reproduce contracts for methods whose class implementations were automatically synthesized from specifications.
SEJan 26, 2022
Fuzzing Class SpecificationsFacundo Molina, Marcelo d'Amorim, Nazareno Aguirre
Expressing class specifications via executable constraints is important for various software engineering tasks such as test generation, bug finding and automated debugging, but developers rarely write them. Techniques that infer specifications from code exist to fill this gap, but they are designed to support specific kinds of assertions and are difficult to adapt to support different assertion languages, e.g., to add support for quantification, or additional comparison operators, such as membership or containment. To address the above issue, we present SpecFuzzer, a novel technique that combines grammar-based fuzzing, dynamic invariant detection, and mutation analysis, to automatically produce class specifications. SpecFuzzer uses: (i) a fuzzer as a generator of candidate assertions derived from a grammar that is automatically obtained from the class definition; (ii) a dynamic invariant detector -- Daikon -- to filter out assertions invalidated by a test suite; and (iii) a mutation-based mechanism to cluster and rank assertions, so that similar constraints are grouped and then the stronger prioritized. Grammar-based fuzzing enables SpecFuzzer to be straightforwardly adapted to support different specification languages, by manipulating the fuzzing grammar, e.g., to include additional operators. We evaluate our technique on a benchmark of 43 Java methods employed in the evaluation of the state-of-the-art techniques GAssert and EvoSpex. Our results show that SpecFuzzer can easily support a more expressive assertion language, over which is more effective than GAssert and EvoSpex in inferring specifications, according to standard performance metrics.
SEMay 26, 2021
Automated Repair of Unrealisable LTL Specifications Guided by Model CountingMatías Brizzio, Maxime Cordy, Mike Papadakis et al.
The reactive synthesis problem consists of automatically producing correct-by-construction operational models of systems from high-level formal specifications of their behaviours. However, specifications are often unrealisable, meaning that no system can be synthesised from the specification. To deal with this problem, we present AuRUS, a search-based approach to repair unrealisable Linear-Time Temporal Logic (LTL) specifications. AuRUS aims at generating solutions that are similar to the original specifications by using the notions of syntactic and semantic similarities. Intuitively, the syntactic similarity measures the text similarity between the specifications, while the semantic similarity measures the number of behaviours preserved/removed by the candidate repair. We propose a new heuristic based on model counting to approximate semantic similarity. We empirically assess AuRUS on many unrealisable specifications taken from different benchmarks and show that it can successfully repair all of them. Also, compared to related techniques, AuRUS can produce many unique solutions while showing more scalability.
SEFeb 27, 2021
Bounded Exhaustive Search of Alloy Specification RepairsSimón Gutiérrez Brida, Germán Regis, Guolong Zhengz et al.
The rising popularity of declarative languages and the hard to debug nature thereof have motivated the need for applicable, automated repair techniques for such languages. However, despite significant advances in the program repair of imperative languages, there is a dearth of repair techniques for declarative languages. This paper presents BeAFix, an automated repair technique for faulty models written in Alloy, a declarative language based on first-order relational logic. BeAFix is backed with a novel strategy for bounded exhaustive, yet scalable, exploration of the spaces of fix candidates and a formally rigorous, sound pruning of such spaces. Moreover, different from the state-of-the-art in Alloy automated repair, that relies on the availability of unit tests, BeAFix does not require tests and can work with assertions that are naturally used in formal declarative languages. Our experience with using BeAFix to repair thousands of realworld faulty models, collected by other researchers, corroborates its ability to effectively generate correct repairs and outperform the state-of-the-art.
SEFeb 19, 2021
FLACK: Counterexample-Guided Fault Localization for Alloy ModelsGuolong Zheng, ThanhVu Nguyen, Simón Gutiérrez Brida et al.
Fault localization is a practical research topic that helps developers identify code locations that might cause bugs in a program. Most existing fault localization techniques are designed for imperative programs (e.g., C and Java) and rely on analyzing correct and incorrect executions of the program to identify suspicious statements. In this work, we introduce a fault localization approach for models written in a declarative language, where the models are not "executed," but rather converted into a logical formula and solved using backend constraint solvers. We present FLACK, a tool that takes as input an Alloy model consisting of some violated assertion and returns a ranked list of suspicious expressions contributing to the assertion violation. The key idea is to analyze the differences between counterexamples, i.e., instances of the model that do not satisfy the assertion, and instances that do satisfy the assertion to find suspicious expressions in the input model. The experimental results show that FLACK is efficient (can handle complex, real-world Alloy models with thousand lines of code within 5 seconds), accurate (can consistently rank buggy expressions in the top 1.9\% of the suspicious list), and useful (can often narrow down the error to the exact location within the suspicious expressions).
SEOct 30, 2019
Stryker: Scaling Specification-Based Program Repair by Pruning Infeasible Mutants with SATLuciano Zemín, Simón Gutiérrez Brida, Santiago Bermúdez et al.
Many techniques for automated program repair involve syntactic program transformations. Applying combinations of such transformations on faulty code yields fix candidates whose correctness must be determined. Exploring these combinations leads to an explosion on the number of generated fix candidates that severely limits the applicability of such fault repair techniques. This explosion is most times tamed by not considering fix candidates exhaustively, and by disabling intra-statement modifications. In this article we present a technique for program repair that considers an ample set of intra-statement syntactic operations, and explores fix candidates exhaustively up to a provided bound. The suitability of the technique, implemented in our tool Stryker, is supported by a novel mechanism to detect and prune infeasible fix candidates. This allows Stryker to repair programs with several bugs, whose fixes require multiple modifications. We evaluate our technique on a benchmark of faulty Java container classes, which Stryker is able to repair, pruning significant parts of the space of generated candidates when more than one bug is present in the code.
SEJan 2, 2014
Proceedings First Latin American Workshop on Formal MethodsNazareno Aguirre, Leila Ribeiro
Formal approaches to software development are techniques that aim at developing quality software by employing notations, analysis processes, etc., based on mathematical grounds. Although traditionally they aim at increasing software correctness, formal techniques have been applied to various other aspects of software quality. Moreover, while originally formal methods employed complex "heavyweight" mechanisms for analysis (often manual or semi automated), there has been progress towards embracing "lightweight", many times fully automated, analysis techniques, that broaden the adoption of formal methods in various software engineering contexts. The Latin American Workshop on Formal Methods brings together researchers working in formal methods, and related areas such as automated analysis. In particular, the workshop provides a venue for Latin American researchers working in these areas, to promote their interaction and collaboration. The workshop was held in August as a satellite event of CONCUR 2013. It took place in Buenos Aires, Argentina's capital and largest city, and one of the most interesting cultural places in South America.