Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft
This work addresses sample efficiency in reinforcement learning for fixed-wing aircraft control, representing an incremental improvement with domain-specific applications.
The paper tackled the problem of sample-efficient offline reinforcement learning for aircraft lateral attitude tracking by proposing a symmetric data augmentation method integrated with DDPG, resulting in accelerated policy convergence in flight control simulations.
The symmetry of dynamical systems can be exploited for state-transition prediction and to facilitate control policy optimization. This paper leverages system symmetry to develop sample-efficient offline reinforcement learning (RL) approaches. Under the symmetry assumption for a Markov Decision Process (MDP), a symmetric data augmentation method is proposed. The augmented samples are integrated into the dataset of Deep Deterministic Policy Gradient (DDPG) to enhance its coverage rate of the state-action space. Furthermore, sample utilization efficiency is improved by introducing a second critic trained on the augmented samples, resulting in a dual-critic structure. The aircraft's model is verified to be symmetric, and flight control simulations demonstrate accelerated policy convergence when augmented samples are employed.