LGOct 11, 2022
Learning Control Policies for Stochastic Systems with Reach-avoid GuaranteesĐorđe Žikelić, Mathias Lechner, Thomas A. Henzinger et al.
We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold $p\in[0,1]$ over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on $3$ stochastic non-linear reinforcement learning tasks.
SYFeb 23, 2015
Temporal Logic Control for Stochastic Linear Systems using Abstraction Refinement of Probabilistic GamesMaria Svorenova, Jan Kretinsky, Martin Chmelik et al.
We consider the problem of computing the set of initial states of a dynamical system such that there exists a control strategy to ensure that the trajectories satisfy a temporal logic specification with probability 1 (almost-surely). We focus on discrete-time, stochastic linear dynamics and specifications given as formulas of the Generalized Reactivity(1) fragment of Linear Temporal Logic over linear predicates in the states of the system. We propose a solution based on iterative abstraction-refinement, and turn-based 2-player probabilistic games. While the theoretical guarantee of our algorithm after any finite number of iterations is only a partial solution, we show that if our algorithm terminates, then the result is the set of satisfying initial states. Moreover, for any (partial) solution our algorithm synthesizes witness control strategies to ensure almost-sure satisfaction of the temporal logic specification. We demonstrate our approach on an illustrative case study.
LGOct 11, 2022
Learning Provably Stabilizing Neural Controllers for Discrete-Time Stochastic SystemsMatin Ansaripour, Krishnendu Chatterjee, Thomas A. Henzinger et al.
We consider the problem of learning control policies in discrete-time stochastic systems which guarantee that the system stabilizes within some specified stabilization region with probability~$1$. Our approach is based on the novel notion of stabilizing ranking supermartingales (sRSMs) that we introduce in this work. Our sRSMs overcome the limitation of methods proposed in previous works whose applicability is restricted to systems in which the stabilizing region cannot be left once entered under any control policy. We present a learning procedure that learns a control policy together with an sRSM that formally certifies probability~$1$ stability, both learned as neural networks. We show that this procedure can also be adapted to formally verifying that, under a given Lipschitz continuous control policy, the stochastic system stabilizes within some stabilizing region with probability~$1$. Our experimental evaluation shows that our learning procedure can successfully learn provably stabilizing policies in practice.
LGNov 29, 2022
Quantization-aware Interval Bound Propagation for Training Certifiably Robust Quantized Neural NetworksMathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee et al.
We study the problem of training and certifying adversarially robust quantized neural networks (QNNs). Quantization is a technique for making neural networks more efficient by running them using low-bit integer arithmetic and is therefore commonly adopted in industry. Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization, and certification of the quantized representation is necessary to guarantee robustness. In this work, we present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs. Inspired by advances in robust learning of non-quantized networks, our training algorithm computes the gradient of an abstract representation of the actual network. Unlike existing approaches, our method can handle the discrete semantics of QNNs. Based on QA-IBP, we also develop a complete verification procedure for verifying the adversarial robustness of QNNs, which is guaranteed to terminate and produce a correct answer. Compared to existing approaches, the key advantage of our verification procedure is that it runs entirely on GPU or other accelerator devices. We demonstrate experimentally that our approach significantly outperforms existing methods and establish the new state-of-the-art for training and certifying the robustness of QNNs.
LGMay 24, 2022
Learning Stabilizing Policies in Stochastic Control SystemsĐorđe Žikelić, Mathias Lechner, Krishnendu Chatterjee et al.
In this work, we address the problem of learning provably stable neural network policies for stochastic control systems. While recent work has demonstrated the feasibility of certifying given policies using martingale theory, the problem of how to learn such policies is little explored. Here, we study the effectiveness of jointly learning a policy together with a martingale certificate that proves its stability using a single learning algorithm. We observe that the joint optimization problem becomes easily stuck in local minima when starting from a randomly initialized policy. Our results suggest that some form of pre-training of the policy is required for the joint optimization to repair and verify the policy successfully.
FLAug 13, 2012
Finite Automata with Time-Delay Blocks (Extended Version)Krishnendu Chatterjee, Thomas A. Henzinger, Vinayak S. Prabhu
The notion of delays arises naturally in many computational models, such as, in the design of circuits, control systems, and dataflow languages. In this work, we introduce \emph{automata with delay blocks} (ADBs), extending finite state automata with variable time delay blocks, for deferring individual transition output symbols, in a discrete-time setting. We show that the ADB languages strictly subsume the regular languages, and are incomparable in expressive power to the context-free languages. We show that ADBs are closed under union, concatenation and Kleene star, and under intersection with regular languages, but not closed under complementation and intersection with other ADB languages. We show that the emptiness and the membership problems are decidable in polynomial time for ADBs, whereas the universality problem is undecidable. Finally we consider the linear-time model checking problem, i.e., whether the language of an ADB is contained in a regular language, and show that the model checking problem is PSPACE-complete.
DSSep 14, 2014
A Framework for Automated Competitive Analysis of On-line Scheduling of Firm-Deadline TasksKrishnendu Chatterjee, Andreas Pavlogiannis, Alexander Kößler et al.
We present a flexible framework for the automated competitive analysis of on-line scheduling algorithms for firm-deadline real-time tasks based on multi-objective graphs: Given a taskset and an on-line scheduling algorithm specified as a labeled transition system, along with some optional safety, liveness, and/or limit-average constraints for the adversary, we automatically compute the competitive ratio of the algorithm w.r.t. a clairvoyant scheduler. We demonstrate the flexibility and power of our approach by comparing the competitive ratio of several on-line algorithms, including $D^{over}$, that have been proposed in the past, for various tasksets. Our experimental results reveal that none of these algorithms is universally optimal, in the sense that there are tasksets where other schedulers provide better performance. Our framework is hence a very useful design tool for selecting optimal algorithms for a given application.
GTMay 31, 2012
Efficient Controller Synthesis for Consumption Games with Multiple Resource TypesTomáš Brázdil, Krishnendu Chatterjee, Antonín Kučera et al.
We introduce consumption games, a model for discrete interactive system with multiple resources that are consumed or reloaded independently. More precisely, a consumption game is a finite-state graph where each transition is labeled by a vector of resource updates, where every update is a non-positive number or omega. The omega updates model the reloading of a given resource. Each vertex belongs either to player \Box or player \Diamond, where the aim of player \Box is to play so that the resources are never exhausted. We consider several natural algorithmic problems about consumption games, and show that although these problems are computationally hard in general, they are solvable in polynomial time for every fixed number of resource types (i.e., the dimension of the update vectors).
SYMar 18, 2015
Quantitative Timed Simulation Functions and Refinement Metrics for Timed Systems (Full Version)Krishnendu Chatterjee, Vinayak S. Prabhu
We introduce quantatitive timed refinement and timed simulation (directed) metrics, incorporating zenoness check s, for timed systems. These metrics assign positive real numbers between zero and infinity which quantify the \emph{timing mismatches} between two timed systems, amongst non-zeno runs. We quantify timing mismatches in three ways: (1) the maximal timing mismatch that can arise, (2) the "steady-state" maximal timing mismatches, where initial transient timing mismatches are ignored; and (3) the (long-run) average timing mismatches amongst two systems. These three kinds of mismatches constitute three important types of timing differences. Our event times are the \emph{global times}, measured from the start of the system execution, not just the time durations of individual steps. We present algorithms over timed automata for computing the three quantitative simulation distances to within any desired degree of accuracy. In order to compute the values of the quantitative simulation distances, we use a game theoretic formulation. We introduce two new kinds of objectives for two player games on finite-state game graphs: (1) eventual debit-sum level objectives, and (2) average debit-sum level objectives. We present algorithms for computing the optimal values for these objectives in graph games, and then use these algorithms to compute the values of the timed simulation distances over timed automata.
3.2LGMay 23
Reinforcement Learning for Reachability: Guaranteeing Asymptotic OptimalityAmogh Palasamudram, Jakub Svoboda, Suguman Bansal et al.
Reinforcement learning (RL) for reachability specifications is fundamental in sequential decision-making, yet theoretical guarantees remain less explored. A recent work achieves asymptotic convergence to optimal policies. However, this approach provides limited insight into convergence dynamics. In this work, we present an alternative approach that provides deeper theoretical insights into convergence. Our approach builds on PAC learning with assumptions. PAC learning guarantees near-optimal policies with high confidence in finite time but requires knowing internal MDP parameters like minimum transition probability. We argue that while these parameters are unknown in RL, they can be iteratively refined and estimated with increasing accuracy. By iteratively satisfying PAC conditions, we show that exact optimality can be achieved in the limit. Empirical evaluations on standard benchmarks validate our theoretical insights into convergence dynamics.
28.6PLMar 27
SuperDP: Differential Privacy Refutation via SupermartingalesKrishnendu Chatterjee, Ehsan Kafshdar Goharshady, ÄorÄe ŽikeliÄ
Differential privacy (DP) has established itself as one of the standards for ensuring privacy of individual data. However, reasoning about DP is a challenging and error-prone task, hence methods for formal verification and refutation of DP properties have received significant interest in recent years. In this work, we present a novel method for automated formal refutation of $ε$-DP. Our method refutes $ε$-DP by searching for a pair of inputs together with a non-negative function over outputs whose expected value on these two inputs differs by a significant amount. The two inputs and the non-negative function over outputs are computed simultaneously, by utilizing upper expectation supermartingales and lower expectation submartingales from probabilistic program analysis, which we leverage to introduce a sound and complete proof rule for $ε$-DP refutation. To the best of our knowledge, our method is the first method for $ε$-DP refutation to offer the following four desirable features: (1)~it is fully automated, (2)~it is applicable to stochastic mechanisms with sampling instructions from both discrete and continuous distributions, (3)~it provides soundness guarantees, and (4)~it provides semi-completeness guarantees. Our experiments show that our prototype tool SuperDP achieves superior performance compared to the state of the art and manages to refute $ε$-DP for a number of challenging examples collected from the literature, including ones that were out of the reach of prior methods.
AIJan 30
Strongly Polynomial Time Complexity of Policy Iteration for $L_\infty$ Robust MDPsAli Asadi, Krishnendu Chatterjee, Ehsan Goharshady et al.
Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in transition probabilities and optimizing against the worst-case realization of that uncertainty. In particular, $(s, a)$-rectangular RMDPs with $L_\infty$ uncertainty sets form a fundamental and expressive model: they subsume classical MDPs and turn-based stochastic games. We consider this model with discounted payoffs. The existence of polynomial and strongly-polynomial time algorithms is a fundamental problem for these optimization models. For MDPs, linear programming yields polynomial-time algorithms for any arbitrary discount factor, and the seminal work of Ye established strongly--polynomial time for a fixed discount factor. The generalization of such results to RMDPs has remained an important open problem. In this work, we show that a robust policy iteration algorithm runs in strongly-polynomial time for $(s, a)$-rectangular $L_\infty$ RMDPs with a constant (fixed) discount factor, resolving an important algorithmic question.
16.2AIMay 11
Automated Approach for Solving Infinite-state Polynomial Reachability GamesKrishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi et al.
Reachability games are two-player games played on a graph, where the objective of $\texttt{REACH}$ player is to reach the target set whereas the objective of $\texttt{SAFE}$ player is to stay away from the target set. Reachability games have important applications in artificial intelligence and reactive synthesis, and many of these applications give rise to infinite-state reachability games. In this paper, we study turn-based reachability games on infinite-state graphs defined over valuations of a finite set of real variables. We consider the problem of determining the existence of and computing a winning strategy for $\texttt{REACH}$ player. Our contributions are twofold. First, we propose ranking certificates for reachability games, a sound and complete proof rule for proving that $\texttt{REACH}$ player has a winning strategy from the specified initial state. Second, we consider polynomial reachability games, where transitions and objectives are described by polynomial constraints over real variables, and propose a fully automated algorithm for computing a winning strategy for $\texttt{REACH}$ player together with a formal correctness witness in the form of a ranking certificate. The algorithm is sound, semi-complete, and runs in sub-exponential time. Our experiments demonstrate the ability of our method to solve challenging examples from the literature that were out of the reach of existing methods. Specifically, for the classical Cinderella-Stepmother game, we are able to compute an optimal winning strategy for an arbitrary precision parameter for the first time.
67.4CCMay 8
On the Complexity of Discounted Robust MDPs with $L_p$ Uncertainty SetsAli Asadi, Krishnendu Chatterjee, Alipasha Montaseri et al.
A basic model in sequential decision making is the Markov decision process (MDP), which is extended to Robust MDPs (RMDPs) by allowing uncertainty in transition probabilities and optimizing against the worst-case transition probabilities from the uncertainty sets. The class of $(s, a)$-rectangular RMDPs with $L_p$ uncertainty sets provides a flexible and expressive model for such problems. We study this class of RMDPs with a discounted-sum cost criterion and a constant discount factor. The existence of an efficient algorithm for this class is a fundamental theoretical question in optimization and sequential decision making. Previous results only establish a strongly polynomial-time algorithm for $L_\infty$ uncertainty sets. In this work, our main results are as follows: (a)~we show that for any compact uncertainty set, the policy iteration algorithm for RMDPs is strongly polynomial with oracle access to solutions of Robust Markov chains (RMCs); (b)~we present strongly polynomial-time bounds on the policy iteration algorithm for RMCs with $L_1$ and $L_\infty$ uncertainty sets; and (c)~we establish hardness results for RMCs with $L_p$ uncertainty sets for integer $p$ satisfying $1<p<\infty$. Finally, motivated by our theoretical bounds, we present experimental results showing how fast policy iteration converges for RMDPs with $L_1$ and $L_\infty$ uncertainty sets.
13.5AIMay 8
Multi-Environment POMDPs with Finite-Horizon ObjectivesLéonard Brice, Filip Cano, Krishnendu Chatterjee et al.
Partially Observable Markov Decision Processes (POMDPs) are systems in which one agent interacts with a stochastic environment, and receives only partial information about the current state. In a multi-environment POMDP (MEPOMDP), the initial state is unknown, and assumed to be adversarially chosen. In this work we focus on computing the optimal value and policy in MEPOMDPs with finite-horizon objectives. That problem is known to be PSPACE-complete in POMDPs. Our main results are as follows: (1) we establish that it is also PSPACE-complete in the more general setting of MEPOMDPs; (2) we present a practical algorithm and evaluate it on classical benchmarks, significantly outperforming the only previously known algorithm.
AIDec 21, 2023
Solving Long-run Average Reward Robust MDPs via Stochastic GamesKrishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi et al.
Markov decision processes (MDPs) provide a standard framework for sequential decision making under uncertainty. However, MDPs do not take uncertainty in transition probabilities into account. Robust Markov decision processes (RMDPs) address this shortcoming of MDPs by assigning to each transition an uncertainty set rather than a single probability value. In this work, we consider polytopic RMDPs in which all uncertainty sets are polytopes and study the problem of solving long-run average reward polytopic RMDPs. We present a novel perspective on this problem and show that it can be reduced to solving long-run average reward turn-based stochastic games with finite state and action spaces. This reduction allows us to derive several important consequences that were hitherto not known to hold for polytopic RMDPs. First, we derive new computational complexity bounds for solving long-run average reward polytopic RMDPs, showing for the first time that the threshold decision problem for them is in $NP \cap coNP$ and that they admit a randomized algorithm with sub-exponential expected runtime. Second, we present Robust Polytopic Policy Iteration (RPPI), a novel policy iteration algorithm for solving long-run average reward polytopic RMDPs. Our experimental evaluation shows that RPPI is much more efficient in solving long-run average reward polytopic RMDPs compared to state-of-the-art methods based on value iteration.
AIMay 7, 2024
Certified Policy Verification and Synthesis for MDPs under Distributional Reach-avoidance PropertiesS. Akshay, Krishnendu Chatterjee, Tobias Meggendorfer et al.
Markov Decision Processes (MDPs) are a classical model for decision making in the presence of uncertainty. Often they are viewed as state transformers with planning objectives defined with respect to paths over MDP states. An increasingly popular alternative is to view them as distribution transformers, giving rise to a sequence of probability distributions over MDP states. For instance, reachability and safety properties in modeling robot swarms or chemical reaction networks are naturally defined in terms of probability distributions over states. Verifying such distributional properties is known to be hard and often beyond the reach of classical state-based verification techniques. In this work, we consider the problems of certified policy (i.e. controller) verification and synthesis in MDPs under distributional reach-avoidance specifications. By certified we mean that, along with a policy, we also aim to synthesize a (checkable) certificate ensuring that the MDP indeed satisfies the property. Thus, given the target set of distributions and an unsafe set of distributions over MDP states, our goal is to either synthesize a certificate for a given policy or synthesize a policy along with a certificate, proving that the target distribution can be reached while avoiding unsafe distributions. To solve this problem, we introduce the novel notion of distributional reach-avoid certificates and present automated procedures for (1) synthesizing a certificate for a given policy, and (2) synthesizing a policy together with the certificate, both providing formal guarantees on certificate correctness. Our experimental evaluation demonstrates the ability of our method to solve several non-trivial examples, including a multi-agent robot-swarm model, to synthesize certified policies and to certify existing policies.
LODec 18, 2024
Quantified Linear and Polynomial Arithmetic Satisfiability via Template-based SkolemizationKrishnendu Chatterjee, Ehsan Kafshdar Goharshady, Mehrdad Karrabi et al. · eth-zurich
The problem of checking satisfiability of linear real arithmetic (LRA) and non-linear real arithmetic (NRA) formulas has broad applications, in particular, they are at the heart of logic-related applications such as logic for artificial intelligence, program analysis, etc. While there has been much work on checking satisfiability of unquantified LRA and NRA formulas, the problem of checking satisfiability of quantified LRA and NRA formulas remains a significant challenge. The main bottleneck in the existing methods is a computationally expensive quantifier elimination step. In this work, we propose a novel method for efficient quantifier elimination in quantified LRA and NRA formulas. We propose a template-based Skolemization approach, where we automatically synthesize linear/polynomial Skolem functions in order to eliminate quantifiers in the formula. The key technical ingredients in our approach are Positivstellensätze theorems from algebraic geometry, which allow for an efficient manipulation of polynomial inequalities. Our method offers a range of appealing theoretical properties combined with a strong practical performance. On the theory side, our method is sound, semi-complete, and runs in subexponential time and polynomial space, as opposed to existing sound and complete quantifier elimination methods that run in doubly-exponential time and at least exponential space. On the practical side, our experiments show superior performance compared to state-of-the-art SMT solvers in terms of the number of solved instances and runtime, both on LRA and on NRA benchmarks.
AIJun 13, 2025
Lower Bound on Howard Policy Iteration for Deterministic Markov Decision ProcessesAli Asadi, Krishnendu Chatterjee, Jakob de Raaij
Deterministic Markov Decision Processes (DMDPs) are a mathematical framework for decision-making where the outcomes and future possible actions are deterministically determined by the current action taken. DMDPs can be viewed as a finite directed weighted graph, where in each step, the controller chooses an outgoing edge. An objective is a measurable function on runs (or infinite trajectories) of the DMDP, and the value for an objective is the maximal cumulative reward (or weight) that the controller can guarantee. We consider the classical mean-payoff (aka limit-average) objective, which is a basic and fundamental objective. Howard's policy iteration algorithm is a popular method for solving DMDPs with mean-payoff objectives. Although Howard's algorithm performs well in practice, as experimental studies suggested, the best known upper bound is exponential and the current known lower bound is as follows: For the input size $I$, the algorithm requires $\tildeΩ(\sqrt{I})$ iterations, where $\tildeΩ$ hides the poly-logarithmic factors, i.e., the current lower bound on iterations is sub-linear with respect to the input size. Our main result is an improved lower bound for this fundamental algorithm where we show that for the input size $I$, the algorithm requires $\tildeΩ(I)$ iterations.
AIMay 10, 2025
Value Iteration with Guessing for Markov Chains and Markov Decision ProcessesKrishnendu Chatterjee, Mahdi JafariRaviz, Raimundo Saona et al.
Two standard models for probabilistic systems are Markov chains (MCs) and Markov decision processes (MDPs). Classic objectives for such probabilistic models for control and planning problems are reachability and stochastic shortest path. The widely studied algorithmic approach for these problems is the Value Iteration (VI) algorithm which iteratively applies local updates called Bellman updates. There are many practical approaches for VI in the literature but they all require exponentially many Bellman updates for MCs in the worst case. A preprocessing step is an algorithm that is discrete, graph-theoretical, and requires linear space. An important open question is whether, after a polynomial-time preprocessing, VI can be achieved with sub-exponentially many Bellman updates. In this work, we present a new approach for VI based on guessing values. Our theoretical contributions are twofold. First, for MCs, we present an almost-linear-time preprocessing algorithm after which, along with guessing values, VI requires only subexponentially many Bellman updates. Second, we present an improved analysis of the speed of convergence of VI for MDPs. Finally, we present a practical algorithm for MDPs based on our new approach. Experimental results show that our approach provides a considerable improvement over existing VI-based approaches on several benchmark examples from the literature.
AIMay 7, 2025
Qualitative Analysis of $ω$-Regular Objectives on Robust MDPsAli Asadi, Krishnendu Chatterjee, Ehsan Kafshdar Goharshady et al.
Robust Markov Decision Processes (RMDPs) generalize classical MDPs that consider uncertainties in transition probabilities by defining a set of possible transition functions. An objective is a set of runs (or infinite trajectories) of the RMDP, and the value for an objective is the maximal probability that the agent can guarantee against the adversarial environment. We consider (a) reachability objectives, where given a target set of states, the goal is to eventually arrive at one of them; and (b) parity objectives, which are a canonical representation for $ω$-regular objectives. The qualitative analysis problem asks whether the objective can be ensured with probability 1. In this work, we study the qualitative problem for reachability and parity objectives on RMDPs without making any assumption over the structures of the RMDPs, e.g., unichain or aperiodic. Our contributions are twofold. We first present efficient algorithms with oracle access to uncertainty sets that solve qualitative problems of reachability and parity objectives. We then report experimental results demonstrating the effectiveness of our oracle-based approach on classical RMDP examples from the literature scaling up to thousands of states.
CCDec 16, 2024
Linear Equations with Min and Max Operators: Computational ComplexityKrishnendu Chatterjee, Ruichen Luo, Raimundo Saona et al.
We consider a class of optimization problems defined by a system of linear equations with min and max operators. This class of optimization problems has been studied under restrictive conditions, such as, (C1) the halting or stability condition; (C2) the non-negative coefficients condition; (C3) the sum up to 1 condition; and (C4) the only min or only max oerator condition. Several seminal results in the literature focus on special cases. For example, turn-based stochastic games correspond to conditions C2 and C3; and Markov decision process to conditions C2, C3, and C4. However, the systematic computational complexity study of all the cases has not been explored, which we address in this work. Some highlights of our results are: with conditions C2 and C4, and with conditions C3 and C4, the problem is NP-complete, whereas with condition C1 only, the problem is in UP intersects coUP. Finally, we establish the computational complexity of the decision problem of checking the respective conditions.
SYMar 14, 2024
Learning Algorithms for Verification of Markov Decision ProcessesTomáš Brázdil, Krishnendu Chatterjee, Martin Chmelik et al.
We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{á}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.
LGDec 17, 2021
Stability Verification in Stochastic Control Systems via Neural Network SupermartingalesMathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee et al.
We consider the problem of formally verifying almost-sure (a.s.) asymptotic stability in discrete-time nonlinear stochastic control systems. While verifying stability in deterministic control systems is extensively studied in the literature, verifying stability in stochastic control systems is an open problem. The few existing works on this topic either consider only specialized forms of stochasticity or make restrictive assumptions on the system, rendering them inapplicable to learning algorithms with neural network policies. In this work, we present an approach for general nonlinear stochastic control problems with two novel aspects: (a) instead of classical stochastic extensions of Lyapunov functions, we use ranking supermartingales (RSMs) to certify a.s.~asymptotic stability, and (b) we present a method for learning neural network RSMs. We prove that our approach guarantees a.s.~asymptotic stability of the system and provides the first method to obtain bounds on the stabilization time, which stochastic Lyapunov functions do not. Finally, we validate our approach experimentally on a set of nonlinear stochastic reinforcement learning environments with neural network policies.
LGNov 4, 2021
Infinite Time Horizon Safety of Bayesian Neural NetworksMathias Lechner, Đorđe Žikelić, Krishnendu Chatterjee et al.
Bayesian neural networks (BNNs) place distributions over the weights of a neural network to model uncertainty in the data and the network's prediction. We consider the problem of verifying safety when running a Bayesian neural network policy in a feedback loop with infinite time horizon systems. Compared to the existing sampling-based approaches, which are inapplicable to the infinite time horizon setting, we train a separate deterministic neural network that serves as an infinite time horizon safety certificate. In particular, we show that the certificate network guarantees the safety of the system over a subset of the BNN weight posterior's support. Our method first computes a safe weight set and then alters the BNN's weight posterior to reject samples outside this set. Moreover, we show how to extend our approach to a safe-exploration reinforcement learning setting, in order to avoid unsafe trajectories during the training of the policy. We evaluate our approach on a series of reinforcement learning benchmarks, including non-Lyapunovian safety specifications.
FLJan 6, 2021
On Satisficing in Quantitative GamesSuguman Bansal, Krishnendu Chatterjee, Moshe Y. Vardi
Several problems in planning and reactive synthesis can be reduced to the analysis of two-player quantitative graph games. {\em Optimization} is one form of analysis. We argue that in many cases it may be better to replace the optimization problem with the {\em satisficing problem}, where instead of searching for optimal solutions, the goal is to search for solutions that adhere to a given threshold bound. This work defines and investigates the satisficing problem on a two-player graph game with the discounted-sum cost model. We show that while the satisficing problem can be solved using numerical methods just like the optimization problem, this approach does not render compelling benefits over optimization. When the discount factor is, however, an integer, we present another approach to satisficing, which is purely based on automata methods. We show that this approach is algorithmically more performant -- both theoretically and empirically -- and demonstrates the broader applicability of satisficing overoptimization.
PLJul 28, 2020
Inductive Reachability WitnessesAli Asadi, Krishnendu Chatterjee, Hongfei Fu et al.
In this work, we consider the fundamental problem of reachability analysis over imperative programs with real variables. The reachability property requires that a program can reach certain target states during its execution. Previous works that tackle reachability analysis are either unable to handle programs consisting of general loops (e.g. symbolic execution), or lack completeness guarantees (e.g. abstract interpretation), or are not automated (e.g. incorrectness logic/reverse Hoare logic). In contrast, we propose a novel approach for reachability analysis that can handle general programs, is (semi-)complete, and can be entirely automated for a wide family of programs. Our approach extends techniques from both invariant generation and ranking-function synthesis to reachability analysis through the notion of (Universal) Inductive Reachability Witnesses (IRWs/UIRWs). While traditional invariant generation uses over-approximations of reachable states, we consider the natural dual problem of under-approximating the set of program states that can reach a target state. We then apply an argument similar to ranking functions to ensure that all states in our under-approximation can indeed reach the target set in finitely many steps.
AIFeb 27, 2020
Reinforcement Learning of Risk-Constrained Policies in Markov Decision ProcessesTomas Brazdil, Krishnendu Chatterjee, Petr Novotny et al.
Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableevents to be below a given threshold, but they do not accountfor optimization of the expected payoff. We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes. The objective of risk-constrainedplanning is to maximize the expected discounted-sum payoffamong risk-averse policies that ensure the probability to en-counter a failure state is below a desired threshold. Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP (in the style of AlphaZero)and with a risk-constrained action selection via linear pro-gramming. We demonstrate the effectiveness of our approachwith experiments on classical MDPs from the literature, in-cluding benchmarks with an order of 10^6 states.
DSJan 29, 2020
Optimal and Perfectly Parallel Algorithms for On-demand Data-flow AnalysisKrishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen et al.
Interprocedural data-flow analyses form an expressive and useful paradigm of numerous static analysis applications, such as live variables analysis, alias analysis and null pointers analysis. The most widely-used framework for interprocedural data-flow analysis is IFDS, which encompasses distributive data-flow functions over a finite domain. On-demand data-flow analyses restrict the focus of the analysis on specific program locations and data facts. This setting provides a natural split between (i) an offline (or preprocessing) phase, where the program is partially analyzed and analysis summaries are created, and (ii) an online (or query) phase, where analysis queries arrive on demand and the summaries are used to speed up answering queries. In this work, we consider on-demand IFDS analyses where the queries concern program locations of the same procedure (aka same-context queries). We exploit the fact that flow graphs of programs have low treewidth to develop faster algorithms that are space and time optimal for many common data-flow analyses, in both the preprocessing and the query phase. We also use treewidth to develop query solutions that are embarrassingly parallelizable, i.e. the total work for answering each query is split to a number of threads such that each thread performs only a constant amount of work. Finally, we implement a static analyzer based on our algorithms, and perform a series of on-demand analysis experiments on standard benchmarks. Our experimental results show a drastic speed-up of the queries after only a lightweight preprocessing phase, which significantly outperforms existing techniques.
GTFeb 21, 2019
Probabilistic Smart Contracts: Secure Randomness on the BlockchainKrishnendu Chatterjee, Amir Kafshdar Goharshady, Arash Pourdamghani
In today's programmable blockchains, smart contracts are limited to being deterministic and non-probabilistic. This lack of randomness is a consequential limitation, given that a wide variety of real-world financial contracts, such as casino games and lotteries, depend entirely on randomness. As a result, several ad-hoc random number generation approaches have been developed to be used in smart contracts. These include ideas such as using an oracle or relying on the block hash. However, these approaches are manipulatable, i.e. their output can be tampered with by parties who might not be neutral, such as the owner of the oracle or the miners. We propose a novel game-theoretic approach for generating provably unmanipulatable pseudorandom numbers on the blockchain. Our approach allows smart contracts to access a trustworthy source of randomness that does not rely on potentially compromised miners or oracles, hence enabling the creation of a new generation of smart contracts that are not limited to being non-probabilistic and can be drawn from the much more general class of probabilistic programs.
CRJun 8, 2018
Ergodic Mean-Payoff Games for the Analysis of Attacks in Crypto-CurrenciesKrishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen et al.
Crypto-currencies are digital assets designed to work as a medium of exchange, e.g., Bitcoin, but they are susceptible to attacks (dishonest behavior of participants). A framework for the analysis of attacks in crypto-currencies requires (a) modeling of game-theoretic aspects to analyze incentives for deviation from honest behavior; (b) concurrent interactions between participants; and (c) analysis of long-term monetary gains. Traditional game-theoretic approaches for the analysis of security protocols consider either qualitative temporal properties such as safety and termination, or the very special class of one-shot (stateless) games. However, to analyze general attacks on protocols for crypto-currencies, both stateful analysis and quantitative objectives are necessary. In this work our main contributions are as follows: (a) we show how a class of concurrent mean-payoff games, namely ergodic games, can model various attacks that arise naturally in crypto-currencies; (b) we present the first practical implementation of algorithms for ergodic games that scales to model realistic problems for crypto-currencies; and (c) we present experimental results showing that our framework can handle games with thousands of states and millions of transitions.
CRMay 23, 2018
Secure Credit Reporting on the BlockchainAmir Kafshdar Goharshady, Ali Behrouz, Krishnendu Chatterjee
We present a secure approach for maintaining and reporting credit history records on the Blockchain. Our approach removes third-parties such as credit reporting agencies from the lending process and replaces them with smart contracts. This allows customers to interact directly with the lenders or banks while ensuring the integrity, unmalleability and privacy of their credit data. Most importantly, each customer is given full control over complete or selective disclosure of her credit records, eliminating the risk of privacy violations or data breaches such as the one that happened to Equifax in 2017. Moreover, our approach provides strong guarantees for the lenders as well. A lender can check both correctness and completeness of the credit data disclosed to her. This is the first approach that is able to perform all real-world credit reporting tasks without a central authority or changing the financial mechanisms.
AIApr 27, 2018
Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-sum ObjectivesKrishnendu Chatterjee, Adrián Elgyütt, Petr Novotný et al.
Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk-averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows obtaining risk-averse policies, but ignores optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem, where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.
PLApr 24, 2018
Computational Approaches for Stochastic Shortest Path on Succinct MDPsKrishnendu Chatterjee, Hongfei Fu, Amir Kafshdar Goharshady et al.
We consider the stochastic shortest path (SSP) problem for succinct Markov decision processes (MDPs), where the MDP consists of a set of variables, and a set of nondeterministic rules that update the variables. First, we show that several examples from the AI literature can be modeled as succinct MDPs. Then we present computational approaches for upper and lower bounds for the SSP problem: (a)~for computing upper bounds, our method is polynomial-time in the implicit description of the MDP; (b)~for lower bounds, we present a polynomial-time (in the size of the implicit description) reduction to quadratic programming. Our approach is applicable even to infinite-state MDPs. Finally, we present experimental results to demonstrate the effectiveness of our approach on several classical examples from the AI literature.
DSApr 19, 2018
Algorithms and Conditional Lower Bounds for Planning ProblemsKrishnendu Chatterjee, Wolfgang Dvořák, Monika Henzinger et al.
We consider planning problems for graphs, Markov decision processes (MDPs), and games on graphs. While graphs represent the most basic planning model, MDPs represent interaction with nature and games on graphs represent interaction with an adversarial environment. We consider two planning problems where there are k different target sets, and the problems are as follows: (a) the coverage problem asks whether there is a plan for each individual target set, and (b) the sequential target reachability problem asks whether the targets can be reached in sequence. For the coverage problem, we present a linear-time algorithm for graphs and quadratic conditional lower bound for MDPs and games on graphs. For the sequential target problem, we present a linear-time algorithm for graphs, a sub-quadratic algorithm for MDPs, and a quadratic conditional lower bound for games on graphs. Our results with conditional lower bounds establish (i) model-separation results showing that for the coverage problem MDPs and games on graphs are harder than graphs and for the sequential reachability problem games on graphs are harder than MDPs and graphs; (ii) objective-separation results showing that for MDPs the coverage problem is harder than the sequential target problem.
AIFeb 10, 2018
Graph Planning with Expected Finite HorizonKrishnendu Chatterjee, Laurent Doyen
Graph planning gives rise to fundamental algorithmic questions such as shortest path, traveling salesman problem, etc. A classical problem in discrete planning is to consider a weighted graph and construct a path that maximizes the sum of weights for a given time horizon $T$. However, in many scenarios, the time horizon is not fixed, but the stopping time is chosen according to some distribution such that the expected stopping time is $T$. If the stopping time distribution is not known, then to ensure robustness, the distribution is chosen by an adversary, to represent the worst-case scenario. A stationary plan for every vertex always chooses the same outgoing edge. For fixed horizon or fixed stopping-time distribution, stationary plans are not sufficient for optimality. Quite surprisingly we show that when an adversary chooses the stopping-time distribution with expected stopping time $T$, then stationary plans are sufficient. While computing optimal stationary plans for fixed horizon is NP-complete, we show that computing optimal stationary plans under adversarial stopping-time distribution can be achieved in polynomial time. Consequently, our polynomial-time algorithm for adversarial stopping time also computes an optimal plan among all possible plans.
AISep 29, 2017
Sensor Synthesis for POMDPs with Reachability ObjectivesKrishnendu Chatterjee, Martin Chmelik, Ufuk Topcu
Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize "weakest" additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability~1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.
NEJun 21, 2017
Faster Monte-Carlo Algorithms for Fixation Probability of the Moran Process on Undirected GraphsKrishnendu Chatterjee, Rasmus Ibsen-Jensen, Martin A. Nowak
Evolutionary graph theory studies the evolutionary dynamics in a population structure given as a connected graph. Each node of the graph represents an individual of the population, and edges determine how offspring are placed. We consider the classical birth-death Moran process where there are two types of individuals, namely, the residents with fitness 1 and mutants with fitness r. The fitness indicates the reproductive strength. The evolutionary dynamics happens as follows: in the initial step, in a population of all resident individuals a mutant is introduced, and then at each step, an individual is chosen proportional to the fitness of its type to reproduce, and the offspring replaces a neighbor uniformly at random. The process stops when all individuals are either residents or mutants. The probability that all individuals in the end are mutants is called the fixation probability. We present faster polynomial-time Monte-Carlo algorithms for finidng the fixation probability on undirected graphs. Our algorithms are always at least a factor O(n^2/log n) faster as compared to the previous algorithms, where n is the number of nodes, and is polynomial even if r is given in binary. We also present lower bounds showing that the upper bound on the expected number of effective steps we present is asymptotically tight for undirected graphs.
SYAug 31, 2017
Value Iteration for Long-run Average Reward in Markov Decision ProcessesPranav Ashok, Krishnendu Chatterjee, Przemyslaw Daca et al.
Markov decision processes (MDPs) are standard models for probabilistic systems with non-deterministic behaviours. Long-run average rewards provide a mathematically elegant formalism for expressing long term performance. Value iteration (VI) is one of the simplest and most efficient algorithmic approaches to MDPs with other properties, such as reachability objectives. Unfortunately, a naive extension of VI does not work for MDPs with long-run average rewards, as there is no known stopping criterion. In this work our contributions are threefold. (1) We refute a conjecture related to stopping criteria for MDPs with long-run average rewards. (2) We present two practical algorithms for MDPs with long-run average rewards based on VI. First, we show that a combination of applying VI locally for each maximal end-component (MEC) and VI for reachability objectives can provide approximation guarantees. Second, extending the above approach with a simulation-guided on-demand variant of VI, we present an anytime algorithm that is able to deal with very large models. (3) Finally, we present experimental results showing that our methods significantly outperform the standard approaches on several benchmarks.
AINov 26, 2016
Optimizing Expectation with Guarantees in POMDPs (Technical Report)Krishnendu Chatterjee, Petr Novotný, Guillermo A. Pérez et al.
A standard objective in partially-observable Markov decision processes (POMDPs) is to find a policy that maximizes the expected discounted-sum payoff. However, such policies may still permit unlikely but highly undesirable outcomes, which is problematic especially in safety-critical applications. Recently, there has been a surge of interest in POMDPs where the goal is to maximize the probability to ensure that the payoff is at least a given threshold, but these approaches do not consider any optimization beyond satisfying this threshold constraint. In this work we go beyond both the "expectation" and "threshold" approaches and consider a "guaranteed payoff optimization (GPO)" problem for POMDPs, where we are given a threshold $t$ and the objective is to find a policy $σ$ such that a) each possible outcome of $σ$ yields a discounted-sum payoff of at least $t$, and b) the expected discounted-sum payoff of $σ$ is optimal (or near-optimal) among all policies satisfying a). We present a practical approach to tackle the GPO problem and evaluate it on standard POMDP benchmarks.
AIFeb 24, 2016
Stochastic Shortest Path with Energy Constraints in POMDPsTomáš Brázdil, Krishnendu Chatterjee, Martin Chmelík et al.
We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.
AINov 26, 2015
A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPsKrishnendu Chatterjee, Martin Chmelik, Jessica Davies
POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIME-complete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach.
AIJan 13, 2015
MultiGain: A controller synthesis tool for MDPs with multiple mean-payoff objectivesTomáš Brázdil, Krishnendu Chatterjee, Vojtěch Forejt et al.
We present MultiGain, a tool to synthesize strategies for Markov decision processes (MDPs) with multiple mean-payoff objectives. Our models are described in PRISM, and our tool uses the existing interface and simulator of PRISM. Our tool extends PRISM by adding novel algorithms for multiple mean-payoff objectives, and also provides features such as (i)~generating strategies and exploring them for simulation, and checking them with respect to other properties; and (ii)~generating an approximate Pareto curve for two mean-payoff objectives. In addition, we present a new practical algorithm for the analysis of MDPs with multiple mean-payoff objectives under memoryless strategies.
AINov 14, 2014
Automatic Generation of Alternative Starting Positions for Simple Traditional Board GamesUmair Z. Ahmed, Krishnendu Chatterjee, Sumit Gulwani
Simple board games, like Tic-Tac-Toe and CONNECT-4, play an important role not only in the development of mathematical and logical skills, but also in the emotional and social development. In this paper, we address the problem of generating targeted starting positions for such games. This can facilitate new approaches for bringing novice players to mastery, and also leads to discovery of interesting game variants. We present an approach that generates starting states of varying hardness levels for player~$1$ in a two-player board game, given rules of the board game, the desired number of steps required for player~$1$ to win, and the expertise levels of the two players. Our approach leverages symbolic methods and iterative simulation to efficiently search the extremely large state space. We present experimental results that include discovery of states of varying hardness levels for several simple grid-based board games. The presence of such states for standard game variants like $4 \times 4$ Tic-Tac-Toe opens up new games to be played that have never been played as the default start state is heavily biased.
AINov 14, 2014
Optimal Cost Almost-sure Reachability in POMDPsKrishnendu Chatterjee, Martin Chmelík, Raghav Gupta et al.
We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objective we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost and the bound is double exponential; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms developing on the existing algorithms for POMDPs with finite-horizon objectives. While the worst-case running time of our algorithm is double exponential, we also present efficient stopping criteria for the algorithm and show experimentally that it performs well in many examples of interest.
LOSep 11, 2014
Qualitative Analysis of POMDPs with Temporal Logic Specifications for Robotics ApplicationsKrishnendu Chatterjee, Martin Chmelík, Raghav Gupta et al.
We consider partially observable Markov decision processes (POMDPs), that are a standard framework for robotics applications to model uncertainties present in the real world, with temporal logic specifications. All temporal logic specifications in linear-time temporal logic (LTL) can be expressed as parity objectives. We study the qualitative analysis problem for POMDPs with parity objectives that asks whether there is a controller (policy) to ensure that the objective holds with probability 1 (almost-surely). While the qualitative analysis of POMDPs with parity objectives is undecidable, recent results show that when restricted to finite-memory policies the problem is EXPTIME-complete. While the problem is intractable in theory, we present a practical approach to solve the qualitative analysis problem. We designed several heuristics to deal with the exponential complexity, and have used our implementation on a number of well-known POMDP examples for robotics applications. Our results provide the first practical approach to solve the qualitative analysis of robot motion planning with LTL properties in the presence of uncertainty.
AIAug 9, 2014
POMDPs under Probabilistic SemanticsKrishnendu Chatterjee, Martin Chmelik
We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special case of quantitative constraint with lambda_1=1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraint we show that the problem of deciding the existence of a finite-memory controller is undecidable.
LOJul 18, 2014
Proceedings 3rd Workshop on SynthesisKrishnendu Chatterjee, Rüdiger Ehlers, Susmit Jha
The idea of synthesis, i.e., the process of automatically computing implementations from their specifications, has recently gained a lot of momentum in the contexts of software engineering and reactive system design. While it is widely believed that, due to complexity/undecidability issues, synthesis cannot completely replace manual engineering, it can assist the process of designing the intricate pieces of code that most programmers find challenging, or help with orchestrating tasks in reactive environments. The SYNT workshop aims to bring together researchers interested in synthesis to discuss and present ongoing and mature work on all aspects of automated synthesis and its applications. The third iteration of the workshop took place in Vienna, Austria, and was co-located with the 26th International Conference on Computer Aided Verification, held in the context of the Vienna Summer of Logic in July 2014. The workshop included eight contributed talks and four invited talks. In addition, it featured a special session about the Syntax-Guided Synthesis Competition (SyGuS) and the SyntComp Synthesis competition.
CRJul 16, 2014
Probabilistic Opacity for Markov Decision ProcessesBéatrice Bérard, Krishnendu Chatterjee, Nathalie Sznajder
Opacity is a generic security property, that has been defined on (non probabilistic) transition systems and later on Markov chains with labels. For a secret predicate, given as a subset of runs, and a function describing the view of an external observer, the value of interest for opacity is a measure of the set of runs disclosing the secret. We extend this definition to the richer framework of Markov decision processes, where non deterministic choice is combined with probabilistic transitions, and we study related decidability problems with partial or complete observation hypotheses for the schedulers. We prove that all questions are decidable with complete observation and $ω$-regular secrets. With partial observation, we prove that all quantitative questions are undecidable but the question whether a system is almost surely non opaque becomes decidable for a restricted class of $ω$-regular secrets, as well as for all $ω$-regular secrets under finite-memory schedulers.
LGMay 5, 2014
Generalized Risk-Aversion in Stochastic Multi-Armed BanditsAlexander Zimin, Rasmus Ibsen-Jensen, Krishnendu Chatterjee
We consider the problem of minimizing the regret in stochastic multi-armed bandit, when the measure of goodness of an arm is not the mean return, but some general function of the mean and the variance.We characterize the conditions under which learning is possible and present examples for which no natural algorithm can achieve sublinear regret.