SEJun 17, 2022
C-Pack of IPAs: A C90 Program Benchmark of Introductory Programming AssignmentsPedro Orvalho, Mikoláš Janota, Vasco Manquinho
Due to the vast number of students enrolled in Massive Open Online Courses (MOOCs), there has been an increasing number of automated program repair techniques focused on introductory programming assignments (IPAs). Such techniques take advantage of previous correct student implementations in order to provide automated, comprehensive, and personalized feedback to students. This paper presents C-Pack-IPAs, a publicly available benchmark of students' programs submitted for 25 different IPAs. C-Pack-IPAs contains semantically correct, semantically incorrect, and syntactically incorrect programs plus a test suite for each IPA. Hence, C-Pack-IPAs can be used to help evaluate the development of novel semantic, as well as syntactic, automated program repair frameworks, focused on providing feedback to novice programmers.
LGAug 27, 2024
Understanding GNNs for Boolean Satisfiability through Approximation AlgorithmsJan Hůla, David Mojžíšek, Mikoláš Janota
The paper deals with the interpretability of Graph Neural Networks in the context of Boolean Satisfiability. The goal is to demystify the internal workings of these models and provide insightful perspectives into their decision-making processes. This is done by uncovering connections to two approximation algorithms studied in the domain of Boolean Satisfiability: Belief Propagation and Semidefinite Programming Relaxations. Revealing these connections has empowered us to introduce a suite of impactful enhancements. The first significant enhancement is a curriculum training procedure, which incrementally increases the problem complexity in the training set, together with increasing the number of message passing iterations of the Graph Neural Network. We show that the curriculum, together with several other optimizations, reduces the training time by more than an order of magnitude compared to the baseline without the curriculum. Furthermore, we apply decimation and sampling of initial embeddings, which significantly increase the percentage of solved problems.
SEJun 28, 2022
InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming AssignmentsPedro Orvalho, Mikoláš Janota, Vasco Manquinho
Due to the vast number of students enrolled in programming courses, there has been an increasing number of automated program repair techniques focused on introductory programming assignments (IPAs). Typically, such techniques use program clustering to take advantage of previous correct student implementations to repair a new incorrect submission. These repair techniques use clustering methods since analyzing all available correct submissions to repair a program is not feasible. However, conventional clustering methods rely on program representations based on features such as abstract syntax trees (ASTs), syntax, control flow, and data flow. This paper proposes InvAASTCluster, a novel approach for program clustering that uses dynamically generated program invariants to cluster semantically equivalent IPAs. InvAASTCluster's program representation uses a combination of the program's semantics, through its invariants, and its structure through its anonymized abstract syntax tree (AASTs). Invariants denote conditions that must remain true during program execution, while AASTs are ASTs devoid of variable and function names, retaining only their types. Our experiments show that the proposed program representation outperforms syntax-based representations when clustering a set of correct IPAs. Furthermore, we integrate InvAASTCluster into a state-of-the-art clustering-based program repair tool. Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based repair tools by repairing around 13% more students' programs, in a shorter amount of time.
SEJul 12, 2024
CFaults: Model-Based Diagnosis for Fault Localization in C Programs with Multiple Test CasesPedro Orvalho, Mikoláš Janota, Vasco Manquinho
Debugging is one of the most time-consuming and expensive tasks in software development. Several formula-based fault localization (FBFL) methods have been proposed, but they fail to guarantee a set of diagnoses across all failing tests or may produce redundant diagnoses that are not subset-minimal, particularly for programs with multiple faults. This paper introduces a novel fault localization approach for C programs with multiple faults. CFaults leverages Model-Based Diagnosis (MBD) with multiple observations and aggregates all failing test cases into a unified MaxSAT formula. Consequently, our method guarantees consistency across observations and simplifies the fault localization procedure. Experimental results on two benchmark sets of C programs, TCAS and C-Pack-IPAs, show that CFaults is faster than other FBFL approaches like BugAssist and SNIPER. Moreover, CFaults only generates subset-minimal diagnoses of faulty statements, whereas the other approaches tend to enumerate redundant diagnoses.
AIAug 26, 2024
Machine Learning for Quantifier Selection in cvc5Jan Jakubův, Mikoláš Janota, Jelle Piepenbrock et al.
In this work we considerably improve the state-of-the-art SMT solving on first-order quantified problems by efficient machine learning guidance of quantifier selection. Quantifiers represent a significant challenge for SMT and are technically a source of undecidability. In our approach, we train an efficient machine learning model that informs the solver which quantifiers should be instantiated and which not. Each quantifier may be instantiated multiple times and the set of the active quantifiers changes as the solving progresses. Therefore, we invoke the ML predictor many times, during the whole run of the solver. To make this efficient, we use fast ML models based on gradient boosting decision trees. We integrate our approach into the state-of-the-art cvc5 SMT solver and show a considerable increase of the system's holdout-set performance after training it on a large set of first-order problems collected from the Mizar Mathematical Library.
SEDec 2, 2025
Model-Based Diagnosis with Multiple Observations: A Unified Approach for C Software and Boolean CircuitsPedro Orvalho, Marta Kwiatkowska, Mikoláš Janota et al.
Debugging is one of the most time-consuming and expensive tasks in software development and circuit design. Several formula-based fault localisation (FBFL) methods have been proposed, but they fail to guarantee a set of diagnoses across all failing tests or may produce redundant diagnoses that are not subset-minimal, particularly for programs/circuits with multiple faults. This paper introduces CFaults, a novel fault localisation tool for C software and Boolean circuits with multiple faults. CFaults leverages Model-Based Diagnosis (MBD) with multiple observations and aggregates all failing test cases into a unified Maximum Satisfiability (MaxSAT) formula. Consequently, our method guarantees consistency across observations and simplifies the fault localisation procedure. Experimental results on three benchmark sets, two of C programs, TCAS and C-Pack-IPAs, and one of Boolean circuits, ISCAS85, show that CFaults is faster at localising faults in C software than other FBFL approaches such as BugAssist, SNIPER, and HSD. On the ISCAS85 benchmark, CFaults is generally slower than HSD; however, it localises faults in only 6% fewer circuits, demonstrating that it remains competitive in this domain. Furthermore, CFaults produces only subset-minimal diagnoses of faulty statements, whereas the other approaches tend to enumerate redundant diagnoses (e.g., BugAssist and SNIPER).
SEJul 24, 2023
Graph Neural Networks For Mapping Variables Between Programs -- Extended VersionPedro Orvalho, Jelle Piepenbrock, Mikoláš Janota et al.
Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.
SEApr 15
A Universal Textual Merge Strategy Based on Tokens for Version Control SystemsQiqi Jason Gu, Mikoláš Janota
Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the presence of refactorings or parallel edits. While syntax- and semantics-aware merging approaches can reduce conflicts, they introduce drawbacks such as loss of formatting, dependence on language-specific parsers, and limited flexibility across heterogeneous artifacts. To address this gap, we present Summer, a novel textual token-based merge algorithm independent of document formats. Dividing text into tokens, our approach formulates token-level changes in one branch into string-rewriting rules and move rules, and applies these rules to the text of the other branch to construct a merge. Despite being independent on programming languages, our move rules model extracting and inlining functions. We evaluated Summer on ConflictBench, a large benchmark of real-world merge scenarios, comparing it with five pioneering merge tools across Java and non-Java files. Experimental results show that Summer achieved the highest 36% accuracy in reproducing merges verbatim identical to developers', and ranked second in semantic accuracy.
SEDec 19, 2024
Counterexample Guided Program Repair Using Zero-Shot Learning and MaxSAT-based Fault LocalizationPedro Orvalho, Mikoláš Janota, Vasco Manquinho
Automated Program Repair (APR) for introductory programming assignments (IPAs) is motivated by the large number of student enrollments in programming courses each year. Since providing feedback on IPAs requires substantial time and effort from faculty, personalized feedback often involves suggesting fixes to students' programs. Formal Methods (FM)-based semantic repair approaches, check a program's execution against a test suite or reference solution, are effective but limited. These tools excel at identifying buggy parts but can only fix programs if the correct implementation and the faulty one share the same control flow graph. Conversely, Large Language Models (LLMs) are used for APR but often make extensive instead of minimal rewrites. This leads to more invasive fixes, making it harder for students to learn from their mistakes. In summary, LLMs excel at completing strings, while FM-based fault localization excel at identifying buggy parts of a program. In this paper, we propose a novel approach that combines the strengths of both FM-based fault localization and LLMs, via zero-shot learning, to enhance APR for IPAs. Our method uses MaxSAT-based fault localization to identify buggy parts of a program, then presents the LLM with a program sketch devoid of these buggy statements. This hybrid approach follows a CEGIS loop to iteratively refine the program. We ask the LLM to synthesize the missing parts, which are then checked against a test suite. If the suggested program is incorrect, a counterexample from the test suite is fed back to the LLM. Our experiments show that our counterexample guided approach, using MaxSAT-based bug-free program sketches, significantly improves the repair capabilities of all six evaluated LLMs. This method allows LLMs to repair more programs with smaller fixes, outperforming other configurations and state-of-the-art symbolic program repair tools.
LGApr 1, 2025
Neural Approaches to SAT Solving: Design Choices and InterpretabilityDavid Mojžíšek, Jan Hůla, Ziwei Li et al.
In this contribution, we provide a comprehensive evaluation of graph neural networks applied to Boolean satisfiability problems, accompanied by an intuitive explanation of the mechanisms enabling the model to generalize to different instances. We introduce several training improvements, particularly a novel closest assignment supervision method that dynamically adapts to the model's current state, significantly enhancing performance on problems with larger solution spaces. Our experiments demonstrate the suitability of variable-clause graph representations with recurrent neural network updates, which achieve good accuracy on SAT assignment prediction while reducing computational demands. We extend the base graph neural network into a diffusion model that facilitates incremental sampling and can be effectively combined with classical techniques like unit propagation. Through analysis of embedding space patterns and optimization trajectories, we show how these networks implicitly perform a process very similar to continuous relaxations of MaxSAT, offering an interpretable view of their reasoning process. This understanding guides our design choices and explains the ability of recurrent architectures to scale effectively at inference time beyond their training distribution, which we demonstrate with test-time scaling experiments.
AIAug 19, 2025
Quantifier Instantiations: To Mimic or To Revolt?Jan Jakubův, Mikoláš Janota
Quantified formulas pose a significant challenge for Satisfiability Modulo Theories (SMT) solvers due to their inherent undecidability. Existing instantiation techniques, such as e-matching, syntax-guided, model-based, conflict-based, and enumerative methods, often complement each other. This paper introduces a novel instantiation approach that dynamically learns from these techniques during solving. By treating observed instantiations as samples from a latent language, we use probabilistic context-free grammars to generate new, similar terms. Our method not only mimics successful past instantiations but also explores diversity by optionally inverting learned term probabilities, aiming to balance exploitation and exploration in quantifier reasoning.
LGAug 19, 2025
Minimizing the Weighted Number of Tardy Jobs: Data-Driven Heuristic for Single-Machine SchedulingNikolai Antonov, Prěmysl Šůcha, Mikoláš Janota et al.
Existing research on single-machine scheduling is largely focused on exact algorithms, which perform well on typical instances but can significantly deteriorate on certain regions of the problem space. In contrast, data-driven approaches provide strong and scalable performance when tailored to the structure of specific datasets. Leveraging this idea, we focus on a single-machine scheduling problem where each job is defined by its weight, duration, due date, and deadline, aiming to minimize the total weight of tardy jobs. We introduce a novel data-driven scheduling heuristic that combines machine learning with problem-specific characteristics, ensuring feasible solutions, which is a common challenge for ML-based algorithms. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art in terms of optimality gap, number of optimal solutions, and adaptability across varied data scenarios, highlighting its flexibility for practical applications. In addition, we conduct a systematic exploration of ML models, addressing a common gap in similar studies by offering a detailed model selection process and providing insights into why the chosen model is the best fit.
LGApr 2, 2025
Geometric Reasoning in the Embedding SpaceJan Hůla, David Mojžíšek, Jiří Janeček et al.
In this contribution, we demonstrate that Graph Neural Networks and Transformers can learn to reason about geometric constraints. We train them to predict spatial position of points in a discrete 2D grid from a set of constraints that uniquely describe hidden figures containing these points. Both models are able to predict the position of points and interestingly, they form the hidden figures described by the input constraints in the embedding space during the reasoning process. Our analysis shows that both models recover the grid structure during training so that the embeddings corresponding to the points within the grid organize themselves in a 2D subspace and reflect the neighborhood structure of the grid. We also show that the Graph Neural Network we design for the task performs significantly better than the Transformer and is also easier to scale.
AIJun 25, 2024
Solving Hard Mizar Problems with Instantiation and Strategy InventionJan Jakubův, Mikoláš Janota, Josef Urban
In this work, we prove over 3000 previously ATP-unproved Mizar/MPTP problems by using several ATP and AI methods, raising the number of ATP-solved Mizar problems from 75\% to above 80\%. First, we start to experiment with the cvc5 SMT solver which uses several instantiation-based heuristics that differ from the superposition-based systems, that were previously applied to Mizar,and add many new solutions. Then we use automated strategy invention to develop cvc5 strategies that largely improve cvc5's performance on the hard problems. In particular, the best invented strategy solves over 14\% more problems than the best previously available cvc5 strategy. We also show that different clausification methods have a high impact on such instantiation-based methods, again producing many new solutions. In total, the methods solve 3021 (21.3\%) of the 14163 previously unsolved hard Mizar problems. This is a new milestone over the Mizar large-theory benchmark and a large strengthening of the hammer methods for Mizar.
AIMay 28, 2021
Fair and Adventurous Enumeration of Quantifier InstantiationsMikoláš Janota, Haniel Barbosa, Pascal Fontaine et al.
SMT solvers generally tackle quantifiers by instantiating their variables with tuples of terms from the ground part of the formula. Recent enumerative approaches for quantifier instantiation consider tuples of terms in some heuristic order. This paper studies different strategies to order such tuples and their impact on performance. We decouple the ordering problem into two parts. First is the order of the sequence of terms to consider for each quantified variable, and second is the order of the instantiation tuples themselves. While the most and least preferred tuples, i.e. those with all variables assigned to the most or least preferred terms, are clear, the combinations in between allow flexibility in an implementation. We look at principled strategies of complete enumeration, where some strategies are more fair, meaning they treat all the variables the same but some strategies may be more adventurous, meaning that they may venture further down the preference list. We further describe new techniques for discarding irrelevant instantiations which are crucial for the performance of these strategies in practice. These strategies are implemented in the SMT solver cvc5, where they contribute to the diversification of the solver's configuration space, as shown by our experimental results.
LGFeb 10, 2021
Learning Equational Theorem ProvingJelle Piepenbrock, Tom Heskes, Mikoláš Janota et al.
We develop Stratified Shortest Solution Imitation Learning (3SIL) to learn equational theorem proving in a deep reinforcement learning (RL) setting. The self-trained models achieve state-of-the-art performance in proving problems generated by one of the top open conjectures in quasigroup theory, the Abelian Inner Mapping (AIM) Conjecture. To develop the methods, we first use two simpler arithmetic rewriting tasks that share tree-structured proof states and sparse rewards with the AIM problems. On these tasks, 3SIL is shown to significantly outperform several established RL and imitation learning methods. The final system is then evaluated in a standalone and cooperative mode on the AIM problems. The standalone 3SIL-trained system proves in 60 seconds more theorems (70.2%) than the complex, hand-engineered Waldmeister system (65.5%). In the cooperative mode, the final system is combined with the Prover9 system, proving in 2 seconds what standalone Prover9 proves in 60 seconds.
AIMay 10, 2015
Exploiting Resolution-based Representations for MaxSAT SolvingMiguel Neves, Ruben Martins, Mikoláš Janota et al.
Most recent MaxSAT algorithms rely on a succession of calls to a SAT solver in order to find an optimal solution. In particular, several algorithms take advantage of the ability of SAT solvers to identify unsatisfiable subformulas. Usually, these MaxSAT algorithms perform better when small unsatisfiable subformulas are found early. However, this is not the case in many problem instances, since the whole formula is given to the SAT solver in each call. In this paper, we propose to partition the MaxSAT formula using a resolution-based graph representation. Partitions are then iteratively joined by using a proximity measure extracted from the graph representation of the formula. The algorithm ends when only one partition remains and the optimal solution is found. Experimental results show that this new approach further enhances a state of the art MaxSAT solver to optimally solve a larger set of industrial problem instances.