SENov 26, 2015Code
SWIM: Synthesizing What I MeanMukund Raghothaman, Yi Wei, Youssef Hamadi
Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as "generate md5 hash code". We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce \emph{structured call sequences} to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis. We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.
77.6SEMay 10
Generating Complex Code Analyzers from Natural Language QuestionsAmirmohammad Nazari, Sadra Sabouri, Wang Bill Zhu et al.
Many software development tasks, such as implementing features and fixing bugs, begin with developers posing questions about a codebase. However, answering questions about codebases that span millions of lines of code across thousands of files is non-trivial. Standard tools like grep cannot answer questions requiring semantic or inter-procedural reasoning, and large language models (LLMs) struggle with large codebases due to resource and context constraints. In this paper, we present Merlin, a new system for answering free-form questions that require analytical reasoning about code. Merlin integrates an LLM with CodeQL, a program analysis framework that supports expressive queries over large codebases. We face two principal challenges in the design of such systems: First, program analysis queries are diverse and semantically complex; as a result, even syntactically well-formed queries frequently produce degenerate/empty results. Furthermore, relatively few CodeQL queries are available online, limiting the out-of-the-box effectiveness of LLMs as CodeQL query generators. We address these challenges by developing a RAG-based iterative query-generation approach and a novel self-test technique. Our query debugging technique builds on the idea of assistive queries, which generate concrete witnesses that expose and explain semantic flaws in candidate queries. We evaluate Merlin through both experimental and user studies. Over a set of natural language questions derived from common bug-finding tasks, Merlin discovered not only the majority of software issues reported by other approaches, but also issues that would have otherwise remained undetected. Through a within-subject user study, we found that access to Merlin increased task accuracy by an average of 3.8* and simultaneously reduced the time for programmers to complete all tasks by 31%.
SEFeb 12, 2021
Data-Driven Synthesis of Provably Sound Side Channel AnalysesJingbo Wang, Chungha Sung, Mukund Raghothaman et al.
We propose a data-driven method for synthesizing a static analyzer to detect side-channel information leaks in cryptographic software. Compared to the conventional way of manually crafting such a static analyzer, which can be labor intensive, error prone and suboptimal, our learning-based technique is not only automated but also provably sound. Our analyzer consists of a set of type-inference rules learned from the training data, i.e., example code snippets annotated with ground truth. Internally, we use syntax-guided synthesis (SyGuS) to generate new features and decision tree learning (DTL) to generate type-inference rules based on these features. We guarantee soundness by formally proving each learned rule via a technique called Datalog query containment checking. We have implemented our technique in the LLVM compiler and used it to detect power side channels in C programs. Our results show that, in addition to being automated and provably sound during synthesis, the learned analyzer also has the same empirical accuracy as two state-of-the-art, manually crafted analyzers while being 300X and 900X faster, respectively.
AIJun 1, 2019
Synthesizing Datalog Programs Using Numerical RelaxationXujie Si, Mukund Raghothaman, Kihong Heo et al.
The problem of learning logical rules from examples arises in diverse fields, including program synthesis, logic programming, and machine learning. Existing approaches either involve solving computationally difficult combinatorial problems, or performing parameter estimation in complex statistical models. In this paper, we present Difflog, a technique to extend the logic programming language Datalog to the continuous setting. By attaching real-valued weights to individual rules of a Datalog program, we naturally associate numerical values with individual conclusions of the program. Analogous to the strategy of numerical relaxation in optimization problems, we can now first determine the rule weights which cause the best agreement between the training labels and the induced values of output tuples, and subsequently recover the classical discrete-valued target program from the continuous optimum. We evaluate Difflog on a suite of 34 benchmark problems from recent literature in knowledge discovery, formal verification, and database query-by-example, and demonstrate significant improvements in learning complex programs with recursive rules, invented predicates, and relations of arbitrary arity.