Umang Mathur

h-index21

8papers

187citations

Novelty68%

AI Score50

Ranked #19,819 of 194,257 authors (top 10%)#9 in PL (top 6%)

8 Papers

5.5PLApr 13

Fixed Parameter Tractable Linearizability Monitoring

Lee Zheng Han, Umang Mathur

We study the linearizability monitoring problem, which asks whether a given concurrent history of a data structure is equivalent to some sequential execution of the same data structure. In general, this problem is $\textsf{NP}$-hard, even for simple objects such as registers. Recent work has identified tractable cases for restricted classes of histories, notably unambiguous and differentiated histories. We revisit the tractability boundary from a fine-grained, parameterized perspective. We show that for a broad class of data structures -- including stacks, queues, priority queues, and maps -- linearizability monitoring is fixed-parameter tractable when parameterized by the number of processes. Concretely, we give an algorithm running in time $O(c^{k} \cdot \textsf{poly}(n))$, where $n$ is the history size, $k$ is the number of processes, and $c$ is a constant, yielding efficient performance when $k$ is small. Our approach reduces linearizability monitoring to a language reachability problem on graphs, which asks whether a labeled graph admits a path whose label sequence belongs to a fixed language $L$. We identify classes of languages that capture the sequential specifications of the above data structures and show that language reachability is efficiently solvable on the graph structures induced by concurrent histories. Our results complement prior hardness results and existing tractable subclasses, and provide a unified algorithmic framework. We implement our approach and demonstrate significant runtime improvements over existing algorithms, which exhibit exponential worst-case behavior.

5.1PLApr 10, 2025Code

Program Skeletons for Automated Program Translation

Bo Wang, Tianyu Li, Ruishi Li et al.

Translating software between programming languages is a challenging task, for which automated techniques have been elusive and hard to scale up to larger programs. A key difficulty in cross-language translation is that one has to re-express the intended behavior of the source program into idiomatic constructs of a different target language. This task needs abstracting away from the source language-specific details, while keeping the overall functionality the same. In this work, we propose a novel and systematic approach for making such translation amenable to automation based on a framework we call program skeletons. A program skeleton retains the high-level structure of the source program by abstracting away and effectively summarizing lower-level concrete code fragments, which can be mechanically translated to the target programming language. A skeleton, by design, permits many different ways of filling in the concrete implementation for fragments, which can work in conjunction with existing data-driven code synthesizers. Most importantly, skeletons can conceptually enable sound decomposition, i.e., if each individual fragment is correctly translated, taken together with the mechanically translated skeleton, the final translated program is deemed to be correct as a whole. We present a prototype system called Skel embodying the idea of skeleton-based translation from Python to JavaScript. Our results show promising scalability compared to prior works. For 9 real-world Python programs, some with more than about 1k lines of code, 95% of their code fragments can be automatically translated, while about 5% require manual effort. All the final translations are correct with respect to whole-program test suites.

11.3SEOct 4, 2025

Adversarial Agent Collaboration for C to Rust Translation

Tianyu Li, Ruishi Li, Bo Wang et al.

Translating C to memory-safe languages, like Rust, prevents critical memory safety vulnerabilities that are prevalent in legacy C software. Existing approaches for C to safe Rust translation, including LLM-assisted ones, do not generalize on larger (> 500 LoC) C codebases because they depend on complex program analyses that frequently break. In this work, we present ACToR (Adversarial C To Rust translator), a simple LLM agent-based approach. Inspired by GANs, ACToR pits a generator agent against a discriminator agent, which collaborate to iteratively generate a Rust translation. On each iteration, the translator agent synthesizes and refines a Rust translation to pass an existing suite of tests, and then the discriminator agent finds new failing tests. We demonstrate that ACToR translates all of the 63 real-world command line utilities considered in our benchmarks, which have an average size of 485 lines of code, and it achieves over 90% test pass rate with zero human intervention. To our knowledge, it is the first such system that reliably translates C programs of this scale. Furthermore, ACToR improves translation correctness by up to 18.9% compared to baseline, non-adversarial approaches.

2.3LOJan 17, 2022

A Tree Clock Data Structure for Causal Orderings in Concurrent Executions

Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç et al.

Dynamic techniques are a scalable and effective way to analyze concurrent programs. Instead of analyzing all behaviors of a program, these techniques detect errors by focusing on a single program execution. Often a crucial step in these techniques is to define a causal ordering between events in the execution, which is then computed using vector clocks, a simple data structure that stores logical times of threads. The two basic operations of vector clocks, namely join and copy, require $Θ(k)$ time, where $k$ is the number of threads. Thus they are a computational bottleneck when $k$ is large. In this work, we introduce tree clocks, a new data structure that replaces vector clocks for computing causal orderings in program executions. Joining and copying tree clocks takes time that is roughly proportional to the number of entries being modified, and hence the two operations do not suffer the a-priori $Θ(k)$ cost per application. We show that when used to compute the classic happens-before (HB) partial order, tree clocks are optimal, in the sense that no other data structure can lead to smaller asymptotic running time. Moreover, we demonstrate that tree clocks can be used to compute other partial orders, such as schedulable-happens-before (SHB) and the standard Mazurkiewicz (MAZ) partial order, and thus are a versatile data structure. Our experiments show that just by replacing vector clocks with tree clocks, the computation becomes from $2.02 \times$ faster (MAZ) to $2.66 \times$ (SHB) and $2.97 \times$ (HB) on average per benchmark. These results illustrate that tree clocks have the potential to become a standard data structure with wide applications in concurrent analyses.

3.0SEOct 20, 2020

Scalable Statistical Root Cause Analysis on App Telemetry

Vijayaraghavan Murali, Edward Yao, Umang Mathur et al.

Despite engineering workflows that aim to prevent buggy code from being deployed, bugs still make their way into the Facebook app. When symptoms of these bugs, such as user submitted reports and automatically captured crashes, are reported, finding their root causes is an important step in resolving them. However, at Facebook's scale of billions of users, a single bug can manifest as several different symptoms according to the various user and execution environments in which the software is deployed. Root cause analysis (RCA) therefore requires tedious manual investigation and domain expertise to extract out common patterns that are observed in groups of reports and use them for debugging. We propose Minesweeper, a technique for RCA that moves towards automatically identifying the root cause of bugs from their symptoms. The method is based on two key aspects: (i) a scalable algorithm to efficiently mine patterns from telemetric information that is collected along with the reports, and (ii) statistical notions of precision and recall of patterns that help point towards root causes. We evaluate Minesweeper's scalability and effectiveness in finding root causes from symptoms on real world bug and crash reports from Facebook's apps. Our evaluation demonstrates that Minesweeper can perform RCA for tens of thousands of reports in less than 3 minutes, and is more than 85% accurate in identifying the root cause of regressions.

3.3PLAug 1, 2018

What Happens - After the First Race? Enhancing the Predictive Power of Happens - Before Based Dynamic Race Detection

Umang Mathur, Dileep Kini, Mahesh Viswanathan

Dynamic race detection is the problem of determining if an observed program execution reveals the presence of a data race in a program. The classical approach to solving this problem is to detect if there is a pair of conflicting memory accesses that are unordered by Lamport's happens-before (HB) relation. HB based race detection is known to not report false positives, i.e., it is sound. However, the soundness guarantee of HB only promises that the first pair of unordered, conflicting events is a schedulable data race. That is, there can be pairs of HB-unordered conflicting data accesses that are not schedulable races because there is no reordering of the events of the execution, where the events in race can be executed immediately after each other. We introduce a new partial order, called schedulable happens-before (SHB) that exactly characterizes the pairs of schedulable data races --- every pair of conflicting data accesses that are identified by SHB can be scheduled, and every HB-race that can be scheduled is identified by SHB. Thus, the SHB partial order is truly sound. We present a linear time, vector clock algorithm to detect schedulable races using SHB. Our experiments demonstrate the value of our algorithm for dynamic race detection --- SHB incurs only little performance overhead and can scale to executions from real-world software applications without compromising soundness.

5.9PLJul 23, 2018Code

Data Race Detection on Compressed Traces

Dileep Kini, Umang Mathur, Mahesh Viswanathan

We consider the problem of detecting data races in program traces that have been compressed using straight line programs (SLP), which are special context-free grammars that generate exactly one string, namely the trace that they represent. We consider two classical approaches to race detection --- using the happens-before relation and the lockset discipline. We present algorithms for both these methods that run in time that is linear in the size of the compressed, SLP representation. Typical program executions almost always exhibit patterns that lead to significant compression. Thus, our algorithms are expected to result in large speedups when compared with analyzing the uncompressed trace. Our experimental evaluation of these new algorithms on standard benchmarks confirms this observation.

8.0PLApr 8, 2017

Dynamic Race Prediction in Linear Time

Dileep Kini, Umang Mathur, Mahesh Viswanathan

Writing reliable concurrent software remains a huge challenge for today's programmers. Programmers rarely reason about their code by explicitly considering different possible inter-leavings of its execution. We consider the problem of detecting data races from individual executions in a sound manner. The classical approach to solving this problem has been to use Lamport's happens-before (HB) relation. Until now HB remains the only approach that runs in linear time. Previous efforts in improving over HB such as causally-precedes (CP) and maximal causal models fall short due to the fact that they are not implementable efficiently and hence have to compromise on their race detecting ability by limiting their techniques to bounded sized fragments of the execution. We present a new relation weak-causally-precedes (WCP) that is provably better than CP in terms of being able to detect more races, while still remaining sound. Moreover it admits a linear time algorithm which works on the entire execution without having to fragment it.