Mahesh Viswanathan

PL
h-index1
16papers
384citations
Novelty61%
AI Score48

16 Papers

SYFeb 22, 2017
DRYVR:Data-driven verification and compositional reasoning for automotive systems

Chuchu Fan, Bolun Qi, Sayan Mitra et al.

We present the DRYVR framework for verifying hybrid control systems that are described by a combination of a black-box simulator for trajectories and a white-box transition graph specifying mode switches. The framework includes (a) a probabilistic algorithm for learning sensitivity of the continuous trajectories from simulation data, (b) a bounded reachability analysis algorithm that uses the learned sensitivity, and (c) reasoning techniques based on simulation relations and sequential composition, that enable verification of complex systems under long switching sequences, from the reachability analysis of a simpler system under shorter sequences. We demonstrate the utility of the framework by verifying a suite of automotive benchmarks that include powertrain control, automatic transmission, and several autonomous and ADAS features like automatic emergency braking, lane-merge, and auto-passing controllers.

48.5CLMay 12
Correcting Selection Bias in Sparse User Feedback for Large Language Model Quality Estimation: A Multi-Agent Hierarchical Bayesian Approach

Andrea Morandi, Mahesh Viswanathan

[Abridged] Production LLM deployments receive feedback from a non-random fraction of users: thumbs sit mostly in the tails of the satisfaction distribution, and a naive average over them can land 40-50 percentage points away from true system quality. We treat this as a topic- and sentiment- stratified selection-bias problem and propose a three-agent hierarchical Bayesian pipeline that does not require ground-truth labels on individual interactions. A Topic Clustering Agent partitions the stream via UMAP + HDBSCAN over text embeddings; a Bias Modeling Agent fits a two-stage hierarchical Beta-Binomial under NUTS, inferring per-topic selection rates $s_c$ and quality $q_c$ with partial pooling; a Synthesis Agent reweights $q_c$ by true topic prevalence $\hatπ_c = n_c/N$ to report a bias-corrected aggregate posterior $\bar Q = \sum_c \hatπ_c q_c$ with credible interval, plus drift signals for online recalibration. Validation uses UltraFeedback (N=10,232 retained interactions, $C=18$ clusters, $Q^\star=0.6249$) with simulated topic- and sentiment-dependent selection biases. We compare five Bayesian variants against Naive and IPW baselines. A mild prior on the feedback channel (typical positive-feedback rate and negative-to-positive ratio, both readable from any production dashboard without labels) keeps Hierarchical-Informed within 4-13 pp of $Q^\star$ as the bias ratio sweeps from 1:1 to 30:1, with 95% credible intervals covering $Q^\star$ in 50/50 random-seed replicates at $κ_{\max}=10$. Without channel-side priors, every weak-prior variant misses $Q^\star$ by 22-33 pp: the per-cluster sufficient statistics admit a one-parameter family of equally good fits, and the prior on the bias channel (not on latent quality) is what breaks the degeneracy.

SYOct 6, 2023
Searching for Optimal Runtime Assurance via Reachability and Reinforcement Learning

Kristina Miller, Christopher K. Zeitler, William Shen et al.

A runtime assurance system (RTA) for a given plant enables the exercise of an untrusted or experimental controller while assuring safety with a backup (or safety) controller. The relevant computational design problem is to create a logic that assures safety by switching to the safety controller as needed, while maximizing some performance criteria, such as the utilization of the untrusted controller. Existing RTA design strategies are well-known to be overly conservative and, in principle, can lead to safety violations. In this paper, we formulate the optimal RTA design problem and present a new approach for solving it. Our approach relies on reward shaping and reinforcement learning. It can guarantee safety and leverage machine learning technologies for scalability. We have implemented this algorithm and present experimental results comparing our approach with state-of-the-art reachability and simulation-based RTA approaches in a number of scenarios using aircraft models in 3D space with complex safety requirements. Our approach can guarantee safety while increasing utilization of the experimental controller over existing approaches.

73.5AIMay 5
From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

Kishan Athrey, Ramin Pishehvar, Brian Riordan et al.

Multi-Agent Systems (MAS) built using AI agents fulfill a variety of user intents that may be used to design and build a family of related applications. However, the creation of such MAS currently involves manual composition of the plan, manual selection of appropriate agents, and manual creation of execution graphs. This paper introduces a framework for the automated creation of multi-agent systems which replaces multiple manual steps with an automated framework. The proposed framework consists of software modules and a workflow to orchestrate the requisite task- specific application. The modules include: an LLM-derived planner, a set of tasks described in natural language, a dynamic call graph, an orchestrator for map agents to tasks, and an agent recommender that finds the most suitable agent(s) from local and global agent registries. The agent recommender uses a two-stage information retrieval (IR) system comprising a fast retriever and an LLM-based re-ranker. We implemented a series of experiments exploring the choice of embedders, re- rankers, agent description enrichment, and supervising critique agent. We benchmarked this system end-to-end, evaluating the combination of planning, agent selection, and task completion, with our proposed approach. Our experimental results show that our approach outperforms the state-of-the- art in terms of the recall rate and is more robust and scalable compared to previous approaches. The critique agent holistically reevaluates both agent and tool recommendations against the overall plan. We show that the inclusion of the critique agent further enhances the recall score, proving that the comprehensive review and revision of task-based agent selection is an essential step in building end-to-end multi-agent systems.

NIOct 11, 2024
DAWN: Designing Distributed Agents in a Worldwide Network

Zahra Aminiranjbar, Jianan Tang, Qiudan Wang et al.

The rapid evolution of Large Language Models (LLMs) has transformed them from basic conversational tools into sophisticated entities capable of complex reasoning and decision-making. These advancements have led to the development of specialized LLM-based agents designed for diverse tasks such as coding and web browsing. As these agents become more capable, the need for a robust framework that facilitates global communication and collaboration among them towards advanced objectives has become increasingly critical. Distributed Agents in a Worldwide Network (DAWN) addresses this need by offering a versatile framework that integrates LLM-based agents with traditional software systems, enabling the creation of agentic applications suited for a wide range of use cases. DAWN enables distributed agents worldwide to register and be easily discovered through Gateway Agents. Collaborations among these agents are coordinated by a Principal Agent equipped with reasoning strategies. DAWN offers three operational modes: No-LLM Mode for deterministic tasks, Copilot for augmented decision-making, and LLM Agent for autonomous operations. Additionally, DAWN ensures the safety and security of agent collaborations globally through a dedicated safety, security, and compliance layer, protecting the network against attackers and adhering to stringent security and compliance standards. These features make DAWN a robust network for deploying agent-based applications across various industries.

LOJan 17, 2022
A Tree Clock Data Structure for Causal Orderings in Concurrent Executions

Umang Mathur, Andreas Pavlogiannis, Hünkar Can Tunç et al.

Dynamic techniques are a scalable and effective way to analyze concurrent programs. Instead of analyzing all behaviors of a program, these techniques detect errors by focusing on a single program execution. Often a crucial step in these techniques is to define a causal ordering between events in the execution, which is then computed using vector clocks, a simple data structure that stores logical times of threads. The two basic operations of vector clocks, namely join and copy, require $Θ(k)$ time, where $k$ is the number of threads. Thus they are a computational bottleneck when $k$ is large. In this work, we introduce tree clocks, a new data structure that replaces vector clocks for computing causal orderings in program executions. Joining and copying tree clocks takes time that is roughly proportional to the number of entries being modified, and hence the two operations do not suffer the a-priori $Θ(k)$ cost per application. We show that when used to compute the classic happens-before (HB) partial order, tree clocks are optimal, in the sense that no other data structure can lead to smaller asymptotic running time. Moreover, we demonstrate that tree clocks can be used to compute other partial orders, such as schedulable-happens-before (SHB) and the standard Mazurkiewicz (MAZ) partial order, and thus are a versatile data structure. Our experiments show that just by replacing vector clocks with tree clocks, the computation becomes from $2.02 \times$ faster (MAZ) to $2.66 \times$ (SHB) and $2.97 \times$ (HB) on average per benchmark. These results illustrate that tree clocks have the potential to become a standard data structure with wide applications in concurrent analyses.

CYJun 7, 2021
Proof Blocks: Autogradable Scaffolding Activities for Learning to Write Proofs

Seth Poulsen, Mahesh Viswanathan, Geoffrey L. Herman et al.

Proof Blocks is a software tool which enables students to write proofs by dragging and dropping prewritten proof lines into the correct order. These proofs can be graded completely automatically, enabling students to receive rapid feedback on how they are doing with their proofs. When constructing a problem, the instructor specifies the dependency graph of the lines of the proof, so that any correct arrangement of the lines can receive full credit. This innovation can improve assessment tools by increasing the types of questions we can ask students about proofs, and can give greater access to proof knowledge by increasing the amount that students can learn on their own with the help of a computer.

CRApr 29, 2021
On Linear Time Decidability of Differential Privacy for Programs with Unbounded Inputs

Rohit Chadha, A. Prasad Sistla, Mahesh Viswanathan

We introduce an automata model for describing interesting classes of differential privacy mechanisms/algorithms that include known mechanisms from the literature. These automata can model algorithms whose inputs can be an unbounded sequence of real-valued query answers. We consider the problem of checking whether there exists a constant $d$ such that the algorithm described by these automata are $dε$-differentially private for all positive values of the privacy budget parameter $ε$. We show that this problem can be decided in time linear in the automaton's size by identifying a necessary and sufficient condition on the underlying graph of the automaton. This paper's results are the first decidability results known for algorithms with an unbounded number of query answers taking values from the set of reals.

CRNov 12, 2020
Deciding Accuracy of Differential Privacy Schemes

Gilles Barthe, Rohit Chadha, Paul Krogmeier et al.

Differential privacy is a mathematical framework for developing statistical computations with provable guarantees of privacy and accuracy. In contrast to the privacy component of differential privacy, which has a clear mathematical and intuitive meaning, the accuracy component of differential privacy does not have a generally accepted definition; accuracy claims of differential privacy algorithms vary from algorithm to algorithm and are not instantiations of a general definition. We identify program discontinuity as a common theme in existing \emph{ad hoc} definitions and introduce an alternative notion of accuracy parametrized by, what we call, {\distance} -- the {\distance} of an input $x$ w.r.t., a deterministic computation $f$ and a distance $d$, is the minimal distance $d(x,y)$ over all $y$ such that $f(y)\neq f(x)$. We show that our notion of accuracy subsumes the definition used in theoretical computer science, and captures known accuracy claims for differential privacy algorithms. In fact, our general notion of accuracy helps us prove better claims in some cases. Next, we study the decidability of accuracy. We first show that accuracy is in general undecidable. Then, we define a non-trivial class of probabilistic computations for which accuracy is decidable (unconditionally, or assuming Schanuel's conjecture). We implement our decision procedure and experimentally evaluate the effectiveness of our approach for generating proofs or counterexamples of accuracy for common algorithms from the literature.

LGApr 1, 2020
Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

Yu Wang, Nima Roohi, Matthew West et al.

Probabilistic Computation Tree Logic (PCTL) is frequently used to formally specify control objectives such as probabilistic reachability and safety. In this work, we focus on model checking PCTL specifications statistically on Markov Decision Processes (MDPs) by sampling, e.g., checking whether there exists a feasible policy such that the probability of reaching certain goal states is greater than a threshold. We use reinforcement learning to search for such a feasible policy for PCTL specifications, and then develop a statistical model checking (SMC) method with provable guarantees on its error. Specifically, we first use upper-confidence-bound (UCB) based Q-learning to design an SMC algorithm for bounded-time PCTL specifications, and then extend this algorithm to unbounded-time specifications by identifying a proper truncation time by checking the PCTL specification and its negation at the same time. Finally, we evaluate the proposed method on case studies.

PLJan 14, 2020
Atomicity Checking in Linear Time using Vector Clocks

Umang Mathur, Mahesh Viswanathan

Multi-threaded programs are challenging to write. Developers often need to reason about a prohibitively large number of thread interleavings to reason about the behavior of software. A non-interference property like atomicity can reduce this interleaving space by ensuring that any execution is equivalent to an execution where all atomic blocks are executed serially. We consider the well studied notion of conflict serializability for dynamically checking atomicity. Existing algorithms detect violations of conflict serializability by detecting cycles in a graph of transactions observed in a given execution. The number of edges in such a graph can grow quadratically with the length of the trace making the analysis not scalable. In this paper, we present AeroDrome, a novel single pass linear time algorithm that uses vector clocks to detect violations of conflict serializability in an online setting. Experiments show that AeroDrome scales to traces with a large number of events with significant speedup.

CROct 9, 2019
Deciding Differential Privacy for Programs with Finite Inputs and Outputs

Gilles Barthe, Rohit Chadha, Vishal Jagannath et al.

Differential privacy is a de facto standard for statistical computations over databases that contain private data. The strength of differential privacy lies in a rigorous mathematical definition that guarantees individual privacy and yet allows for accurate statistical results. Thanks to its mathematical definition, differential privacy is also a natural target for formal analysis. A broad line of work uses logical methods for proving privacy. However, these methods are not complete, and only partially automated. A recent and complementary line of work uses statistical methods for finding privacy violations. However, the methods only provide statistical guarantees (but no proofs). We propose the first decision procedure for checking the differential privacy of a non-trivial class of probabilistic computations. Our procedure takes as input a program P parametrized by a privacy budget $ε$, and either proves differential privacy for all possible values of $ε$ or generates a counterexample. In addition, our procedure applies both to $ε$-differential privacy and $(ε,δ)$-differential privacy. Technically, the decision procedure is based on a novel and judicious encoding of the semantics of programs in our class into a decidable fragment of the first-order theory of the reals with exponentiation. We implement our procedure and use it for (dis)proving privacy bounds for many well-known examples, including randomized response, histogram, report noisy max and sparse vector.

PLJun 29, 2019
Deciding Memory Safety for Single-Pass Heap-Manipulating Programs

Umang Mathur, Adithya Murali, Paul Krogmeier et al.

We investigate the decidability of automatic program verification for programs that manipulate heaps, and in particular, decision procedures for proving memory safety for them. We extend recent work that identified a decidable subclass of uninterpreted programs to a class of alias-aware programs that can update maps. We apply this theory to develop verification algorithms for memory safety--- determining if a heap-manipulating program that allocates and frees memory locations and manipulates heap pointers does not dereference an unallocated memory location. We show that this problem is decidable when the initial allocated heap forms a forest data-structure and when programs are streaming-coherent, which intuitively restricts programs to make a single pass over a data-structure. Our experimental evaluation on a set of library routines that manipulate forest data-structures shows that common single-pass algorithms on data-structures often fall in the decidable class, and that our decision procedure is efficient in verifying them.

PLAug 1, 2018
What Happens - After the First Race? Enhancing the Predictive Power of Happens - Before Based Dynamic Race Detection

Umang Mathur, Dileep Kini, Mahesh Viswanathan

Dynamic race detection is the problem of determining if an observed program execution reveals the presence of a data race in a program. The classical approach to solving this problem is to detect if there is a pair of conflicting memory accesses that are unordered by Lamport's happens-before (HB) relation. HB based race detection is known to not report false positives, i.e., it is sound. However, the soundness guarantee of HB only promises that the first pair of unordered, conflicting events is a schedulable data race. That is, there can be pairs of HB-unordered conflicting data accesses that are not schedulable races because there is no reordering of the events of the execution, where the events in race can be executed immediately after each other. We introduce a new partial order, called schedulable happens-before (SHB) that exactly characterizes the pairs of schedulable data races --- every pair of conflicting data accesses that are identified by SHB can be scheduled, and every HB-race that can be scheduled is identified by SHB. Thus, the SHB partial order is truly sound. We present a linear time, vector clock algorithm to detect schedulable races using SHB. Our experiments demonstrate the value of our algorithm for dynamic race detection --- SHB incurs only little performance overhead and can scale to executions from real-world software applications without compromising soundness.

PLJul 23, 2018
Data Race Detection on Compressed Traces

Dileep Kini, Umang Mathur, Mahesh Viswanathan

We consider the problem of detecting data races in program traces that have been compressed using straight line programs (SLP), which are special context-free grammars that generate exactly one string, namely the trace that they represent. We consider two classical approaches to race detection --- using the happens-before relation and the lockset discipline. We present algorithms for both these methods that run in time that is linear in the size of the compressed, SLP representation. Typical program executions almost always exhibit patterns that lead to significant compression. Thus, our algorithms are expected to result in large speedups when compared with analyzing the uncompressed trace. Our experimental evaluation of these new algorithms on standard benchmarks confirms this observation.

PLApr 8, 2017
Dynamic Race Prediction in Linear Time

Dileep Kini, Umang Mathur, Mahesh Viswanathan

Writing reliable concurrent software remains a huge challenge for today's programmers. Programmers rarely reason about their code by explicitly considering different possible inter-leavings of its execution. We consider the problem of detecting data races from individual executions in a sound manner. The classical approach to solving this problem has been to use Lamport's happens-before (HB) relation. Until now HB remains the only approach that runs in linear time. Previous efforts in improving over HB such as causally-precedes (CP) and maximal causal models fall short due to the fact that they are not implementable efficiently and hence have to compromise on their race detecting ability by limiting their techniques to bounded sized fragments of the execution. We present a new relation weak-causally-precedes (WCP) that is provably better than CP in terms of being able to detect more races, while still remaining sound. Moreover it admits a linear time algorithm which works on the entire execution without having to fragment it.