SYNov 2, 2022
Interval Markov Decision Processes with Continuous Action-SpacesGiannis Delimpaltadakis, Morteza Lahijanian, Manuel Mazo et al.
Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to $|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set $\mathcal{A}$ is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of $\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.
SYMar 2, 2022
Isochronous Partitions for Region-Based Self-Triggered ControlGiannis Delimpaltadakis, Manuel Mazo
In this work, we propose a region-based self-triggered control (STC) scheme for nonlinear systems. The state space is partitioned into a finite number of regions, each of which is associated to a uniform inter-event time. The controller, at each sampling time instant, checks to which region does the current state belong, and correspondingly decides the next sampling time instant. To derive the regions along with their corresponding inter-event times, we use approximations of isochronous manifolds, a notion firstly introduced in [1]. This work addresses some theoretical issues of [1] and proposes an effective computational approach that generates approximations of isochronous manifolds, thus enabling the region-based STC scheme. The efficiency of both our theoretical results and the proposed algorithm are demonstrated through simulation examples.
LGNov 30, 2023
Predictable Reinforcement Learning Dynamics through Entropy Rate MinimizationDaniel Jarne Ornia, Giannis Delimpaltadakis, Jens Kober et al.
In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors, and are often pushed (through e.g. policy entropy regularisation) to randomise their actions in favor of exploration. This often makes it challenging for other agents and humans to predict an agent's behavior, triggering unsafe scenarios (e.g. in human-robot interaction). We propose a novel method to induce predictable behavior in RL agents, termed Predictability-Aware RL (PARL), employing the agent's trajectory entropy rate to quantify predictability. Our method maximizes a linear combination of a standard discounted reward and the negative entropy rate, thus trading off optimality with predictability. We show how the entropy rate can be formally cast as an average reward, how entropy-rate value functions can be estimated from a learned model and incorporate this in policy-gradient algorithms, and demonstrate how this approach produces predictable (near-optimal) policies in tasks inspired by human-robot use-cases.