Erik-Jan van Kampen

SY
h-index4
9papers
57citations
Novelty51%
AI Score47

9 Papers

SYJun 26, 2016
Framework for state and unknown input estimation of linear time-varying systems

Peng Lu, Erik-Jan van Kampen, Cornelis C. de Visser et al.

The design of unknown-input decoupled observers and filters requires the assumption of an existence condition in the literature. This paper addresses an unknown input filtering problem where the existence condition is not satisfied. Instead of designing a traditional unknown input decoupled filter, a Double-Model Adaptive Estimation approach is extended to solve the unknown input filtering problem. It is proved that the state and the unknown inputs can be estimated and decoupled using the extended Double-Model Adaptive Estimation approach without satisfying the existence condition. Numerical examples are presented in which the performance of the proposed approach is compared to methods from literature.

ROMar 28, 2022
Adaptive Risk-Tendency: Nano Drone Navigation in Cluttered Environments with Distributional Reinforcement Learning

Cheng Liu, Erik-Jan van Kampen, Guido C. H. E. de Croon

Enabling the capability of assessing risk and making risk-aware decisions is essential to applying reinforcement learning to safety-critical robots like drones. In this paper, we investigate a specific case where a nano quadcopter robot learns to navigate an apriori-unknown cluttered environment under partial observability. We present a distributional reinforcement learning framework to generate adaptive risk-tendency policies. Specifically, we propose to use lower tail conditional variance of the learnt return distribution as intrinsic uncertainty estimation, and use exponentially weighted average forecasting (EWAF) to adapt the risk-tendency in accordance with the estimated uncertainty. In simulation and real-world empirical results, we show that (1) the most effective risk-tendency vary across states, (2) the agent with adaptive risk-tendency achieves superior performance compared to risk-neutral policy or risk-averse policy baselines.

NAFeb 3, 2016
Towards the multivariate simplotope spline: continuity conditions in a class of mixed simplotopic grids

Tim Visser, Cornelis C. de Visser, Erik-Jan van Kampen

Smooth joins of simplex Bernstein-Bézier polynomials have been studied extensively in the past. In this paper a new method is proposed to define continuity conditions for tensor-product Bernstein polynomials on a class of mixed grids that meets certain out-of-facet parallelism criteria. The conditions are derived by first defining a simplex around the simplotopic bases of the tensor-product polynomials. Then the continuity conditions in the multivariate simplex spline defined on the resulting simplices, are adapted to hold for the tensor-product polynomials. The two- and three-dimensional results agree with the results found in the literature. It is expected that the method can be employed in more general grids.

LGJul 13, 2024
Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft

Yifei Li, Erik-Jan van Kampen

The symmetry of dynamical systems can be exploited for state-transition prediction and to facilitate control policy optimization. This paper leverages system symmetry to develop sample-efficient offline reinforcement learning (RL) approaches. Under the symmetry assumption for a Markov Decision Process (MDP), a symmetric data augmentation method is proposed. The augmented samples are integrated into the dataset of Deep Deterministic Policy Gradient (DDPG) to enhance its coverage rate of the state-action space. Furthermore, sample utilization efficiency is improved by introducing a second critic trained on the augmented samples, resulting in a dual-critic structure. The aircraft's model is verified to be symmetric, and flight control simulations demonstrate accelerated policy convergence when augmented samples are employed.

36.7SYMay 10
Unifying Hamilton-Jacobi Reachability and Reinforcement Learning

Prashant Solanki, Isabelle El-Hajj, Jasper van Beers et al.

We unify Hamilton-Jacobi (HJ) reachability and Reinforcement Learning (RL) through a proposed running cost formulation. We prove that the resultant travel-cost value function is the unique bounded viscosity solution of a time-dependent Hamilton-Jacobi Bellman (HJB) Partial Differential Equation (PDE) with zero terminal data, whose negative sublevel set equals the strict backward-reachable tube. Using a forward reparameterization and a contraction inducing Bellman update, we show that fixed points of small-step RL value iteration converge to the viscosity solution of the forward discounted HJB. Experiments on a classical benchmark validate this connection by demonstrating convergence of learned value functions toward semi-Lagrangian HJB solutions and by quantifying approximation error across the state space. These results empirically support the theoretical analysis, showing that the proposed framework preserves reachability-based safety semantics while remaining compatible with deep RL implementations.

AIOct 26, 2025
Lyapunov Function-guided Reinforcement Learning for Flight Control

Yifei Li, Erik-Jan van Kampen

A cascaded online learning flight control system has been developed and enhanced with respect to action smoothness. In this paper, we investigate the convergence performance of the control system, characterized by the increment of a Lyapunov function candidate. The derivation of this metric accounts for discretization errors and state prediction errors introduced by the incremental model. Comparative results are presented through flight control simulations.

SYJul 6, 2025
Improving Action Smoothness for a Cascaded Online Learning Flight Control System

Yifei Li, Erik-jan van Kampen

This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.

SYFeb 16, 2022
Soft Actor-Critic Deep Reinforcement Learning for Fault Tolerant Flight Control

Killian Dally, Erik-Jan van Kampen

Fault-tolerant flight control faces challenges, as developing a model-based controller for each unexpected failure is unrealistic, and online learning methods can handle limited system complexity due to their low sample efficiency. In this research, a model-free coupled-dynamics flight controller for a jet aircraft able to withstand multiple failure types is proposed. An offline trained cascaded Soft Actor-Critic Deep Reinforcement Learning controller is successful on highly coupled maneuvers, including a coordinated 40 degree bank climbing turn with a normalized Mean Absolute Error of 2.64%. The controller is robust to six failure cases, including the rudder jammed at -15 deg, the aileron effectiveness reduced by 70%, a structural failure, icing and a backward c.g. shift as the response is stable and the climbing turn is completed successfully. Robustness to biased sensor noise, atmospheric disturbances, and to varying initial flight conditions and reference signal shapes is also demonstrated.

LGJun 30, 2016
On Approximate Dynamic Programming with Multivariate Splines for Adaptive Control

Willem Eerland, Coen de Visser, Erik-Jan van Kampen

We define a SDP framework based on the RLSTD algorithm and multivariate simplex B-splines. We introduce a local forget factor capable of preserving the continuity of the simplex splines. This local forget factor is integrated with the RLSTD algorithm, resulting in a modified RLSTD algorithm that is capable of tracking time-varying systems. We present the results of two numerical experiments, one validating SDP and comparing it with NDP and another to show the advantages of the modified RLSTD algorithm over the original. While SDP requires more computations per time-step, the experiment shows that for the same amount of function approximator parameters, there is an increase in performance in terms of stability and learning rate compared to NDP. The second experiment shows that SDP in combination with the modified RLSTD algorithm allows for faster recovery compared to the original RLSTD algorithm when system parameters are altered, paving the way for an adaptive high-performance non-linear control method.