Amin Falah

LGMar 16, 2023

Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

Amin Falah, Shibashis Guha, Ashutosh Trivedi

Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes (MDPs). Unfortunately, no automatic translation exists for CTMDPs. We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the ``good states" of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.

8.4SYMay 1

HyperCertificates: Verification of Discrete-time Dynamical Systems against HyperLTL Specifications

Vishnu Murali, Amin Falah, Ashutosh Trivedi et al.

We introduce a functional inductive framework to verify discrete-time dynamical systems against hyperproperties specified as Hyperlinear temporal logic formulae via a notion of HyperCertificates. Unlike linear temporal logic (LTL) formulae which are concerned with individual traces of a system, hyperproperties are properties that are concerned with how the traces of a system relate to one another. HyperLTL is an extension of LTL for hyperproperties, and is useful to describe specifications such as opacity, privacy as well as notions of robustness. Our notion of HyperCertificates consists of a pair of functions, where the first models the lookahead, and the second relies on a combination of barrier and ranking functions. We use closure certificates, to act as a model for this lookahead and then rely on barrier and ranking function arguments modulo this lookahead to provide guarantees against HyperLTL formulae. We demonstrate how our approach is automatable via existing techniques such as sum-of-squares optimization (SOS) and satisfiability modulo theories (SMT) solvers. Finally, we demonstrate our approach on some case studies.

Amin Falah

2 Papers