SYJun 3, 2022
PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDPChaitanya Agarwal, Shibashis Guha, Jan Křetínský et al.
Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, we extend it to unknown CTMDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability, which has been advocated in literature. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.
LGMar 16, 2023
Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDPAmin Falah, Shibashis Guha, Ashutosh Trivedi
Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes (MDPs). Unfortunately, no automatic translation exists for CTMDPs. We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the ``good states" of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.
GTMay 12
Sure-almost-sure and Sure-limit-sure Window Mean Payoff in Markov Decision ProcessesPranshu Gaba, Shibashis Guha
Given rationals $α$ and $β$, the sure-almost-sure problem for a quantitative objective $φ$ in a Markov decision process (MDP) asks if one can simultaneously ensure that all outcomes of the MDP have $φ$-value at least $α$ (i.e. sure $α$ satisfaction) and with probability $1$ the outcome has $φ$-value at least $β$ (i.e. almost-sure $β$ satisfaction). The sure-limit-sure problem asks if for all $\varepsilon > 0$ one can simultaneously ensure that all outcomes have $φ$-value at least $α$ and with probability at least $1 - \varepsilon$ the outcome has $φ$-value at least $β$. Moreover, if simultaneous satisfaction of objectives is possible, then one would also like to construct a strategy (for sure-almost-sure) or a family of strategies (for sure-limit-sure) that achieves this. In this paper, we solve the sure-almost-sure and sure-limit-sure problems for window mean-payoff objectives. The window mean-payoff objective strengthens the standard mean-payoff objective by requiring that the average payoff of a finite window that slides over an infinite run be greater than a given threshold. We study two variants of window mean payoff: in the fixed variant, the window length $\ell$ is given, while in the bounded variant, the length is not given but is required to be bounded throughout the run. We show that the sure-almost-sure problem and the sure-limit-sure problem are both in P for the fixed variant (if $\ell$ is given in unary) and are both in NP $\cap$ coNP for the bounded variant, matching the computational complexity of sure satisfaction and almost-sure satisfaction when considered separately for these objectives. We also give bounds for the memory requirement of winning strategies for all considered problems.
LOMay 9
Set Automata and Limits of Decidability of Two-Variable Logic on Data WordsShibashis Guha, Amaldev Manuel, S P Rishal
We extend the two-variable logic on data words with guarded regular binary predicates of the form $\widetilde{L}(x,y)$ that is true if positions $x$ and $y$ are in the same class and the factor strictly between $x$ and $y$ is in the regular language $L$. We characterise the class of monoids for which the extension of the two-variable logic with guarded predicates recognised by the monoid is decidable, namely the class of idempotent monoids whose two-sided ideals are linearly ordered. For this, we introduce an automata formalism, set automata, that is equivalent to the class automata of Bojańczyk and Lasota and thus has an undecidable emptiness problem. We identify a subclass of set automata called ordered quasi-normal set automata that has a decidable emptiness problem by reduction to the emptiness problem of ordered multicounter automata. We show that the two-variable logic extended with guarded regular predicates recognised by a semigroup $S$ is expressively equivalent to a quasi-normal set automaton with the semigroup of transformations $S$. In particular, if $S$ is a linear band monoid then the resulting automaton is ordered, and the decidability result follows.
AIMay 19, 2020
Safe Learning for Near Optimal SchedulingDamien Busatto-Gaston, Debraj Chakraborty, Shibashis Guha et al.
In this paper, we investigate the combination of synthesis, model-based learning, and online sampling techniques to obtain safe and near-optimal schedulers for a preemptible task scheduling problem. Our algorithms can handle Markov decision processes (MDPs) that have 1020 states and beyond which cannot be handled with state-of-the art probabilistic model-checkers. We provide probably approximately correct (PAC) guarantees for learning the model. Additionally, we extend Monte-Carlo tree search with advice, computed using safety games or obtained using the earliest-deadline-first scheduler, to safely explore the learned model online. Finally, we implemented and compared our algorithms empirically against shielded deep Q-learning on large task systems.
LOApr 28, 2020
Mixing Probabilistic and non-Probabilistic Objectives in Markov Decision ProcessesRaphaël Berthon, Shibashis Guha, Jean-François Raskin
In this paper, we consider algorithms to decide the existence of strategies in MDPs for Boolean combinations of objectives. These objectives are omega-regular properties that need to be enforced either surely, almost surely, existentially, or with non-zero probability. In this setting, relevant strategies are randomized infinite memory strategies: both infinite memory and randomization may be needed to play optimally. We provide algorithms to solve the general case of Boolean combinations and we also investigate relevant subcases. We further report on complexity bounds for these problems.
PLJun 25, 2015
From Traces To Proofs: Proving Concurrent Program SafeChinmay Narayan, Subodh Sharma, Shibashis Guha et al.
Nondeterminism in scheduling is the cardinal reason for difficulty in proving correctness of concurrent programs. A powerful proof strategy was recently proposed [6] to show the correctness of such programs. The approach captured data-flow dependencies among the instructions of an interleaved and error-free execution of threads. These data-flow dependencies were represented by an inductive data-flow graph (iDFG), which, in a nutshell, denotes a set of executions of the concurrent program that gave rise to the discovered data-flow dependencies. The iDFGs were further transformed in to alternative finite automatons (AFAs) in order to utilize efficient automata-theoretic tools to solve the problem. In this paper, we give a novel and efficient algorithm to directly construct AFAs that capture the data-flow dependencies in a concurrent program execution. We implemented the algorithm in a tool called ProofTraPar to prove the correctness of finite state cyclic programs under the sequentially consistent memory model. Our results are encouranging and compare favorably to existing state-of-the-art tools.