Inês Lynce

AI
9papers
108citations
Novelty61%
AI Score45

9 Papers

SEAug 28, 2023Code
MELT: Mining Effective Lightweight Transformations from Pull Requests

Daniel Ramos, Hailie Mitchell, Inês Lynce et al. · cmu

Software developers often struggle to update APIs, leading to manual, time-consuming, and error-prone processes. We introduce MELT, a new approach that generates lightweight API migration rules directly from pull requests in popular library repositories. Our key insight is that pull requests merged into open-source libraries are a rich source of information sufficient to mine API migration rules. By leveraging code examples mined from the library source and automatically generated code examples based on the pull requests, we infer transformation rules in \comby, a language for structural code search and replace. Since inferred rules from single code examples may be too specific, we propose a generalization procedure to make the rules more applicable to client projects. MELT rules are syntax-driven, interpretable, and easily adaptable. Moreover, unlike previous work, our approach enables rule inference to seamlessly integrate into the library workflow, removing the need to wait for client code migrations. We evaluated MELT on pull requests from four popular libraries, successfully mining 461 migration rules from code examples in pull requests and 114 rules from auto-generated code examples. Our generalization procedure increases the number of matches for mined rules by 9x. We applied these rules to client projects and ran their tests, which led to an overall decrease in the number of warnings and fixing some test cases demonstrating MELT's effectiveness in real-world scenarios.

AIApr 14, 2022
Exact and approximate determination of the Pareto set using minimal correction subsets

Andreia P. Guerreiro, João Cortes, Daniel Vanderpooten et al.

Recently, it has been shown that the enumeration of Minimal Correction Subsets (MCS) of Boolean formulas allows solving Multi-Objective Boolean Optimization (MOBO) formulations. However, a major drawback of this approach is that most MCSs do not correspond to Pareto-optimal solutions. In fact, one can only know that a given MCS corresponds to a Pareto-optimal solution when all MCSs are enumerated. Moreover, if it is not possible to enumerate all MCSs, then there is no guarantee of the quality of the approximation of the Pareto frontier. This paper extends the state of the art for solving MOBO using MCSs. First, we show that it is possible to use MCS enumeration to solve MOBO problems such that each MCS necessarily corresponds to a Pareto-optimal solution. Additionally, we also propose two new algorithms that can find a (1 + {\varepsilon})-approximation of the Pareto frontier using MCS enumeration. Experimental results in several benchmark sets show that the newly proposed algorithms allow finding better approximations of the Pareto frontier than state-of-the-art algorithms, and with guaranteed approximation ratios.

AIApr 22, 2022
New Core-Guided and Hitting Set Algorithms for Multi-Objective Combinatorial Optimization

João Cortes, Inês Lynce, Vasco Manquinho

In the last decade, a plethora of algorithms for single-objective Boolean optimization has been proposed that rely on the iterative usage of a highly effective Propositional Satisfiability (SAT) solver. But the use of SAT solvers in Multi-Objective Combinatorial Optimization (MOCO) algorithms is still scarce. Due to this shortage of efficient tools for MOCO, many real-world applications formulated as multi-objective are simplified to single-objective, using either a linear combination or a lexicographic ordering of the objective functions to optimize. In this paper, we extend the state of the art of MOCO solvers with two novel unsatisfiability-based algorithms. The first is a core-guided MOCO solver. The second is a hitting set-based MOCO solver. Experimental results obtained in a wide range of benchmark instances show that our new unsatisfiability-based algorithms can outperform state-of-the-art SAT-based algorithms for MOCO.

SEFeb 12, 2021Code
SOAR: A Synthesis Approach for Data Science API Refactoring

Ansong Ni, Daniel Ramos, Aidan Yang et al.

With the growth of the open-source data science community, both the number of data science libraries and the number of versions for the same library are increasing rapidly. To match the evolving APIs from those libraries, open-source organizations often have to exert manual effort to refactor the APIs used in the code base. Moreover, due to the abundance of similar open-source libraries, data scientists working on a certain application may have an abundance of libraries to choose, maintain and migrate between. The manual refactoring between APIs is a tedious and error-prone task. Although recent research efforts were made on performing automatic API refactoring between different languages, previous work relies on statistical learning with collected pairwise training data for the API matching and migration. Using large statistical data for refactoring is not ideal because such training data will not be available for a new library or a new version of the same library. We introduce Synthesis for Open-Source API Refactoring (SOAR), a novel technique that requires no training data to achieve API migration and refactoring. SOAR relies only on the documentation that is readily available at the release of the library to learn API representations and mapping between libraries. Using program synthesis, SOAR automatically computes the correct configuration of arguments to the APIs and any glue code required to invoke those APIs. SOAR also uses the interpreter's error messages when running refactored code to generate logical constraints that can be used to prune the search space. Our empirical evaluation shows that SOAR can successfully refactor 80% of our benchmarks corresponding to deep learning models with up to 44 layers with an average run time of 97.23 seconds, and 90% of the data wrangling benchmarks with an average run time of 17.31 seconds.

LOMay 10, 2015Code
DistMS: A Non-Portfolio Distributed Solver for Maximum Satisfiability

Miguel Neves, Inês Lynce, Vasco Manquinho

The most successful parallel SAT and MaxSAT solvers follow a portfolio approach, where each thread applies a different algorithm (or the same algorithm configured differently) to solve a given problem instance. The main goal of building a portfolio is to diversify the search process being carried out by each thread. As soon as one thread finishes, the instance can be deemed solved. In this paper we present a new open source distributed solver for MaxSAT solving that addresses two issues commonly found in multicore parallel solvers, namely memory contention and scalability. Preliminary results show that our non-portfolio distributed MaxSAT solver outperforms its sequential version and is able to solve more instances as the number of processes increases.

SEFeb 1
SPELL: Synthesis of Programmatic Edits using LLMs

Daniel Ramos, Catarina Gamboa, Inês Lynce et al.

Library migration is a common but error-prone task in software development. Developers may need to replace one library with another due to reasons like changing requirements or licensing changes. Migration typically entails updating and rewriting source code manually. While automated migration tools exist, most rely on mining examples from real-world projects that have already undergone similar migrations. However, these data are scarce, and collecting them for arbitrary pairs of libraries is difficult. Moreover, these migration tools often miss out on leveraging modern code transformation infrastructure. In this paper, we present a new approach to automated API migration that sidesteps the limitations described above. Instead of relying on existing migration data or using LLMs directly for transformation, we use LLMs to extract migration examples. Next, we use an Agent to generalize those examples to reusable transformation scripts in PolyglotPiranha, a modern code transformation tool. Our method distills latent migration knowledge from LLMs into structured, testable, and repeatable migration logic, without requiring preexisting corpora or manual engineering effort. Experimental results across Python libraries show that our system can generate diverse migration examples and synthesize transformation scripts that generalize to real-world codebases.

FLDec 28, 2020
FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions

Margarida Ferreira, Miguel Terra-Neves, Miguel Ventura et al.

Form validators based on regular expressions are often used on digital forms to prevent users from inserting data in the wrong format. However, writing these validators can pose a challenge to some users. We present FOREST, a regular expression synthesizer for digital form validations. FOREST produces a regular expression that matches the desired pattern for the input values and a set of conditions over capturing groups that ensure the validity of integer values in the input. Our synthesis procedure is based on enumerative search and uses a Satisfiability Modulo Theories (SMT) solver to explore and prune the search space. We propose a novel representation for regular expressions synthesis, multi-tree, which induces patterns in the examples and uses them to split the problem through a divide-and-conquer approach. We also present a new SMT encoding to synthesize capture conditions for a given regular expression. To increase confidence in the synthesized regular expression, we implement user interaction based on distinguishing inputs. We evaluated FOREST on real-world form-validation instances using regular expressions. Experimental results show that FOREST successfully returns the desired regular expression in 72% of the instances and outperforms REGEL, a state-of-the-art regular expression synthesizer.

AIMay 10, 2015
Exploiting Resolution-based Representations for MaxSAT Solving

Miguel Neves, Ruben Martins, Mikoláš Janota et al.

Most recent MaxSAT algorithms rely on a succession of calls to a SAT solver in order to find an optimal solution. In particular, several algorithms take advantage of the ability of SAT solvers to identify unsatisfiable subformulas. Usually, these MaxSAT algorithms perform better when small unsatisfiable subformulas are found early. However, this is not the case in many problem instances, since the whole formula is given to the SAT solver in each call. In this paper, we propose to partition the MaxSAT formula using a resolution-based graph representation. Partitions are then iteratively joined by using a proximity measure extracted from the graph representation of the formula. The algorithm ends when only one partition remains and the optimal solution is found. Experimental results show that this new approach further enhances a state of the art MaxSAT solver to optimally solve a larger set of industrial problem instances.

AIJul 26, 2012
On When and How to use SAT to Mine Frequent Itemsets

Rui Henriques, Inês Lynce, Vasco Manquinho

A new stream of research was born in the last decade with the goal of mining itemsets of interest using Constraint Programming (CP). This has promoted a natural way to combine complex constraints in a highly flexible manner. Although CP state-of-the-art solutions formulate the task using Boolean variables, the few attempts to adopt propositional Satisfiability (SAT) provided an unsatisfactory performance. This work deepens the study on when and how to use SAT for the frequent itemset mining (FIM) problem by defining different encodings with multiple task-driven enumeration options and search strategies. Although for the majority of the scenarios SAT-based solutions appear to be non-competitive with CP peers, results show a variety of interesting cases where SAT encodings are the best option.