Vu Le

h-index19

7papers

168citations

Novelty56%

AI Score32

Ranked #123,639 of 194,257 authors (top 64%)#1,360 in SE (top 45%)

7 Papers

5.9DBAug 21, 2023

DataVinci: Learning Syntactic and Semantic String Repairs

Mukul Singh, José Cambronero, Sumit Gulwani et al. · microsoft-research

String data is common in real-world datasets: 67.6% of values in a sample of 1.8 million real Excel spreadsheets from the web were represented as text. Systems that successfully clean such string data can have a significant impact on real users. While prior work has explored errors in string data, proposed approaches have often been limited to error detection or require that the user provide annotations, examples, or constraints to fix the errors. Furthermore, these systems have focused independently on syntactic errors or semantic errors in strings, but ignore that strings often contain both syntactic and semantic substrings. We introduce DataVinci, a fully unsupervised string data error detection and repair system. DataVinci learns regular-expression-based patterns that cover a majority of values in a column and reports values that do not satisfy such patterns as data errors. DataVinci can automatically derive edits to the data error based on the majority patterns and constraints learned over other columns without the need for further user interaction. To handle strings with both syntactic and semantic substrings, DataVinci uses an LLM to abstract (and re-concretize) portions of strings that are semantic prior to learning majority patterns and deriving edits. Because not all data can result in majority patterns, DataVinci leverages execution information from an existing program (which reads the target data) to identify and correct data repairs that would not otherwise be identified. DataVinci outperforms 7 baselines on both error detection and repair when evaluated on 4 existing and new benchmarks.

1.7SEAug 14, 2023

Demonstration of CORNET: A System For Learning Spreadsheet Formatting Rules By Example

Mukul Singh, Jose Cambronero, Sumit Gulwani et al. · microsoft-research

Data management and analysis tasks are often carried out using spreadsheet software. A popular feature in most spreadsheet platforms is the ability to define data-dependent formatting rules. These rules can express actions such as "color red all entries in a column that are negative" or "bold all rows not containing error or failure." Unfortunately, users who want to exercise this functionality need to manually write these conditional formatting (CF) rules. We introduce CORNET, a system that automatically learns such conditional formatting rules from user examples. CORNET takes inspiration from inductive program synthesis and combines symbolic rule enumeration, based on semi-supervised clustering and iterative decision tree learning, with a neural ranker to produce accurate conditional formatting rules. In this demonstration, we show CORNET in action as a simple add-in to Microsoft Excel. After the user provides one or two formatted cells as examples, CORNET generates formatting rule suggestions for the user to apply to the spreadsheet.

24.2AIAug 16, 2024

Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al. · microsoft-research

LLMs-as-a-judge is a recently popularized method which replaces human judgements in task evaluation (Zheng et al. 2024) with automatic evaluation using LLMs. Due to widespread use of RLHF (Reinforcement Learning from Human Feedback), state-of-the-art LLMs like GPT4 and Llama3 are expected to have strong alignment with human preferences when prompted for a quality judgement, such as the coherence of a text. While this seems beneficial, it is not clear whether the assessments by an LLM-as-a-judge constitute only an evaluation based on the instructions in the prompts, or reflect its preference for high-quality data similar to its fine-tune data. To investigate how much influence prompting the LLMs-as-a-judge has on the alignment of AI judgements to human judgements, we analyze prompts with increasing levels of instructions about the target quality of an evaluation, for several LLMs-as-a-judge. Further, we compare to a prompt-free method using model perplexity as a quality measure instead. We aggregate a taxonomy of quality criteria commonly used across state-of-the-art evaluations with LLMs and provide this as a rigorous benchmark of models as judges. Overall, we show that the LLMs-as-a-judge benefit only little from highly detailed instructions in prompts and that perplexity can sometimes align better with human judgements than prompting, especially on textual quality.

12.2CLOct 16, 2023

Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs

Ananya Singha, José Cambronero, Sumit Gulwani et al.

Large language models (LLMs) are increasingly applied for tabular tasks using in-context learning. The prompt representation for a table may play a role in the LLMs ability to process the table. Inspired by prior work, we generate a collection of self-supervised structural tasks (e.g. navigate to a cell and row; transpose the table) and evaluate the performance differences when using 8 formats. In contrast to past work, we introduce 8 noise operations inspired by real-world messy data and adversarial inputs, and show that such operations can impact LLM performance across formats for different structural understanding tasks.

15.7SEMar 2, 2021Code

Can Program Synthesis be Used to Learn Merge Conflict Resolutions? An Empirical Analysis

Rangeet Pan, Vu Le, Nachiappan Nagappan et al.

Forking structure is widespread in the open-source repositories and that causes a significant number of merge conflicts. In this paper, we study the problem of textual merge conflicts from the perspective of Microsoft Edge, a large, highly collaborative fork off the main Chromium branch with significant merge conflicts. Broadly, this study is divided into two sections. First, we empirically evaluate textual merge conflicts in Microsoft Edge and classify them based on the type of files, location of conflicts in a file, and the size of conflicts. We found that ~28% of the merge conflicts are 1-2 line changes, and many resolutions have frequent patterns. Second, driven by these findings, we explore Program Synthesis (for the first time) to learn patterns and resolve structural merge conflicts. We propose a novel domain-specific language (DSL) that captures many of the repetitive merge conflict resolution patterns and learn resolution strategies as programs in this DSL from example resolutions. We found that the learned strategies can resolve 11.4% of the conflicts (~41% of 1-2 line changes) that arise in the C++ files with 93.2% accuracy.

7.0SEMar 21, 2024

Semantically Aligned Question and Code Generation for Automated Insight Generation

Ananya Singha, Bhavya Chopra, Anirudh Khatry et al. · microsoft-research

Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language models can generate code that does not correctly correspond (or align) to the insight. In this paper, we leverage the semantic knowledge of large language models to generate targeted and insightful questions about data and the corresponding code to answer those questions. Then through an empirical study on data from Open-WikiTable, we show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code. Additionally, we found that generating questions and code together yields more diverse questions.

12.6SEFeb 13, 2025

TableTalk: Scaffolding Spreadsheet Development with a Language Agent

Jenny T. Liang, Aayush Kumar, Yasharth Bajpai et al. · microsoft-research

Spreadsheet programming is challenging. Programmers use spreadsheet programming knowledge (e.g., formulas) and problem-solving skills to combine actions into complex tasks. Advancements in large language models have introduced language agents that observe, plan, and perform tasks, showing promise for spreadsheet creation. We present TableTalk, a spreadsheet programming agent embodying three design principles -- scaffolding, flexibility, and incrementality -- derived from studies with seven spreadsheet programmers and 85 Excel templates. TableTalk guides programmers through structured plans based on professional workflows, generating three potential next steps to adapt plans to programmer needs. It uses pre-defined tools to generate spreadsheet components and incrementally build spreadsheets. In a study with 20 programmers, TableTalk produced higher-quality spreadsheets 2.3 times more likely to be preferred than the baseline. It reduced cognitive load and thinking time by 12.6%. From this, we derive design guidelines for agentic spreadsheet programming tools and discuss implications on spreadsheet programming, end-user programming, AI-assisted programming, and human-agent collaboration.