Giannis Delimpaltadakis

3papers

37citations

Novelty42%

AI Score23

Ranked #181,917 of 205,806 authors (top 88%)#1,484 in SY (top 72%)

3 Papers

SYNov 2, 2022

Interval Markov Decision Processes with Continuous Action-Spaces

Giannis Delimpaltadakis, Morteza Lahijanian, Manuel Mazo et al.

Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to $|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set $\mathcal{A}$ is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of $\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.

SYMar 2, 2022

Isochronous Partitions for Region-Based Self-Triggered Control

Giannis Delimpaltadakis, Manuel Mazo

In this work, we propose a region-based self-triggered control (STC) scheme for nonlinear systems. The state space is partitioned into a finite number of regions, each of which is associated to a uniform inter-event time. The controller, at each sampling time instant, checks to which region does the current state belong, and correspondingly decides the next sampling time instant. To derive the regions along with their corresponding inter-event times, we use approximations of isochronous manifolds, a notion firstly introduced in [1]. This work addresses some theoretical issues of [1] and proposes an effective computational approach that generates approximations of isochronous manifolds, thus enabling the region-based STC scheme. The efficiency of both our theoretical results and the proposed algorithm are demonstrated through simulation examples.

LGNov 30, 2023

Predictable Reinforcement Learning Dynamics through Entropy Rate Minimization

Daniel Jarne Ornia, Giannis Delimpaltadakis, Jens Kober et al.

In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors, and are often pushed (through e.g. policy entropy regularisation) to randomise their actions in favor of exploration. This often makes it challenging for other agents and humans to predict an agent's behavior, triggering unsafe scenarios (e.g. in human-robot interaction). We propose a novel method to induce predictable behavior in RL agents, termed Predictability-Aware RL (PARL), employing the agent's trajectory entropy rate to quantify predictability. Our method maximizes a linear combination of a standard discounted reward and the negative entropy rate, thus trading off optimality with predictability. We show how the entropy rate can be formally cast as an average reward, how entropy-rate value functions can be estimated from a learned model and incorporate this in policy-gradient algorithms, and demonstrate how this approach produces predictable (near-optimal) policies in tasks inspired by human-robot use-cases.