LGAug 15, 2024
Maximally Permissive Reward MachinesGiovanni Varricchione, Natasha Alechina, Mehdi Dastani et al.
Reward machines allow the definition of rewards for temporally extended tasks and behaviors. Specifying "informative" reward machines can be challenging. One way to address this is to generate reward machines from a high-level abstract description of the learning environment, using techniques such as AI planning. However, previous planning-based approaches generate a reward machine based on a single (sequential or partial-order) plan, and do not allow maximum flexibility to the learning agent. In this paper we propose a new approach to synthesising reward machines which is based on the set of partial order plans for a goal. We prove that learning using such "maximally permissive" reward machines results in higher rewards than learning using RMs based on a single plan. We present experimental results which support our theoretical claims by showing that our approach obtains higher rewards than the single-plan approach in practice.
FLMay 7
Temporal Causal Models as a Model of ComputationMaksim Gladyshev, Natasha Alechina, Brian Logan
Causal models, also known as Structural Equation Models (SEM), are a well-known formalism for representing and reasoning about causal dependencies between events. In this paper, we show that Temporal SEMs (TSEMs), which extend SEMs to support causal reasoning in temporal settings, can be interpreted as a model of computation. We prove that TSEMs can encode Linear Bounded Automata, and thus causal settings representable in context sensitive languages. We also prove that TSEMs with countably many variables are Turing complete. These results establish a formal connection between causal reasoning and classical models of computation, enabling the integration of counterfactual reasoning techniques from causal inference into the theory of computation.
AIAug 9, 2025
Pushdown Reward Machines for Reinforcement LearningGiovanni Varricchione, Toryn Q. Klassen, Natasha Alechina et al.
Reward machines (RMs) are automata structures that encode (non-Markovian) reward functions for reinforcement learning (RL). RMs can reward any behaviour representable in regular languages and, when paired with RL algorithms that exploit RM structure, have been shown to significantly improve sample efficiency in many domains. In this work, we present pushdown reward machines (pdRMs), an extension of reward machines based on deterministic pushdown automata. pdRMs can recognise and reward temporally extended behaviours representable in deterministic context-free languages, making them more expressive than reward machines. We introduce two variants of pdRM-based policies, one which has access to the entire stack of the pdRM, and one which can only access the top $k$ symbols (for a given constant $k$) of the stack. We propose a procedure to check when the two kinds of policies (for a given environment, pdRM, and constant $k$) achieve the same optimal state values. We then provide theoretical results establishing the expressive power of pdRMs, and space complexity results for the proposed learning problems. Lastly, we propose an approach for off-policy RL algorithms that exploits counterfactual experiences with pdRMs. We conclude by providing experimental results showing how agents can be trained to perform tasks representable in deterministic context-free languages using pdRMs.
AIJan 17, 2025
Temporal Causal Reasoning with (Non-Recursive) Structural Equation ModelsMaksim Gladyshev, Natasha Alechina, Mehdi Dastani et al.
Structural Equation Models (SEM) are the standard approach to representing causal dependencies between variables in causal models. In this paper we propose a new interpretation of SEMs when reasoning about Actual Causality, in which SEMs are viewed as mechanisms transforming the dynamics of exogenous variables into the dynamics of endogenous variables. This allows us to combine counterfactual causal reasoning with existing temporal logic formalisms, and to introduce a temporal logic, CPLTL, for causal reasoning about such structures. We show that the standard restriction to so-called \textit{recursive} models (with no cycles in the dependency graph) is not necessary in our approach, allowing us to reason about mutually dependent processes and feedback loops. Finally, we introduce new notions of model equivalence for temporal causal models, and show that CPLTL has an efficient model-checking procedure.
AIDec 19, 2024
Probabilistic Strategy Logic with Degrees of ObservabilityChunyan Mu, Nima Motamed, Natasha Alechina et al.
There has been considerable work on reasoning about the strategic ability of agents under imperfect information. However, existing logics such as Probabilistic Strategy Logic are unable to express properties relating to information transparency. Information transparency concerns the extent to which agents' actions and behaviours are observable by other agents. Reasoning about information transparency is useful in many domains including security, privacy, and decision-making. In this paper, we present a formal framework for reasoning about information transparency properties in stochastic multi-agent systems. We extend Probabilistic Strategy Logic with new observability operators that capture the degree of observability of temporal properties by agents. We show that the model checking problem for the resulting logic is decidable.
CCDec 5, 2021
The Complexity of Data-Driven Norm Synthesis and RevisionDavide Dell'Anna, Natasha Alechina, Brian Logan et al.
Norms have been widely proposed as a way of coordinating and controlling the activities of agents in a multi-agent system (MAS). A norm specifies the behaviour an agent should follow in order to achieve the objective of the MAS. However, designing norms to achieve a particular system objective can be difficult, particularly when there is no direct link between the language in which the system objective is stated and the language in which the norms can be expressed. In this paper, we consider the problem of synthesising a norm from traces of agent behaviour, where each trace is labelled with whether the behaviour satisfies the system objective. We show that the norm synthesis problem is NP-complete.
AIMay 20, 2020
Causality, Responsibility and Blame in Team PlansNatasha Alechina, Joseph Y. Halpern, Brian Logan
Many objectives can be achieved (or may be achieved more effectively) only by a group of agents executing a team plan. If a team plan fails, it is often of interest to determine what caused the failure, the degree of responsibility of each agent for the failure, and the degree of blame attached to each agent. We show how team plans can be represented in terms of structural equations, and then apply the definitions of causality introduced by Halpern [2015] and degree of responsibility and blame introduced by Chockler and Halpern [2004] to determine the agent(s) who caused the failure and what their degree of responsibility/blame is. We also prove new results on the complexity of computing causality and degree of responsibility and blame, showing that they can be determined in polynomial time for many team plans of interest.
AIJul 12, 2018
Situation Calculus for Synthesis of Manufacturing ControllersGiuseppe De Giacomo, Brian Logan, Paolo Felli et al.
Manufacturing is transitioning from a mass production model to a manufacturing as a service model in which manufacturing facilities 'bid' to produce products. To decide whether to bid for a complex, previously unseen product, a manufacturing facility must be able to synthesize, 'on the fly', a process plan controller that delegates abstract manufacturing tasks in the supplied process recipe to the appropriate manufacturing resources, e.g., CNC machines, robots etc. Previous work in applying AI behaviour composition to synthesize process plan controllers has considered only finite state ad-hoc representations. Here, we study the problem in the relational setting of the Situation Calculus. By taking advantage of recent work on abstraction in the Situation Calculus, process recipes and available resources are represented by ConGolog programs over, respectively, an abstract and a concrete action theory. This allows us to capture the problem in a formal, general framework, and show decidability for the case of bounded action theories. We also provide techniques for actually synthesizing the controller.
AIAug 24, 2016
Expressibility of norms in temporal logicNatasha Alechina, Mehdi Dastani, Brian Logan
In this short note we address the issue of expressing norms (such as obligations and prohibitions) in temporal logic. In particular, we address the argument from [Governatori 2015] that norms cannot be expressed in Linear Time Temporal Logic (LTL).