Duarte Antunes

4.1LGApr 8, 2025Code

Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models

J. S. van Hulst, W. P. M. H. Heemels, D. J. Antunes

Reinforcement learning (RL) is a powerful framework for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We address this challenge by incorporating prior model knowledge to guide exploration and accelerate the learning process. Specifically, we assume access to a model set that contains the true transition kernel and reward function. We optimize over this model set to obtain upper and lower bounds on the Q-function, which are then used to guide the exploration of the agent. We provide theoretical guarantees on the convergence of the Q-function to the optimal Q-function under the proposed class of exploring policies. Furthermore, we also introduce a data-driven regularized version of the model set optimization problem that ensures the convergence of the class of exploring policies to the optimal policy. Lastly, we show that when the model set has a specific structure, namely the bounded-parameter MDP (BMDP) framework, the regularized model set optimization problem becomes convex and simple to implement. In this setting, we also prove finite-time convergence to the optimal policy under mild assumptions. We demonstrate the effectiveness of the proposed exploration strategy, which we call BUMEX (Bounded Uncertainty Model-based Exploration), in a simulation study. The results indicate that the proposed method can significantly accelerate learning in benchmark examples. A toolbox is available at https://github.com/JvHulst/BUMEX.

1.9ROJun 17, 2019

Trajectory Tracking for Quadrotors with Attitude Control on $\mathcal{S}^2 \times \mathcal{S}^1$

Dave Kooijman, Angela P. Schoellig, Duarte J. Antunes

The control of a quadrotor is typically split into two subsequent problems: finding desired accelerations to control its position, and controlling its attitude and the total thrust to track these accelerations and to track a yaw angle reference. While the thrust vector, generating accelerations, and the angle of rotation about the thrust vector, determining the yaw angle, can be controlled independently, most attitude control strategies in the literature, relying on representations in terms of quaternions, rotation matrices or Euler angles, result in an unnecessary coupling between the control of the thrust vector and of the angle about this vector. This leads, for instance, to undesired position tracking errors due to yaw tracking errors. In this paper we propose to tackle the attitude control problem using an attitude representation in the Cartesian product of the 2-sphere and the 1-sphere, denoted by $\mathcal{S}^2\times \mathcal{S}^1$. We propose a non-linear tracking control law on $\mathcal{S}^2\times \mathcal{S}^1$ that decouples the control of the thrust vector and of the angle of rotation about the thrust vector, and guarantees almost global asymptotic stability. Simulation results highlight the advantages of the proposed approach over previous approaches.

Duarte Antunes

2 Papers