LGAIROOct 21, 2024

A Plug-and-Play Fully On-the-Job Real-Time Reinforcement Learning Algorithm for a Direct-Drive Tandem-Wing Experiment Platforms Under Multiple Random Operating Conditions

arXiv:2410.15554v2h-index: 9
Originality Incremental advance
AI Analysis

This work addresses control problems for biomimetic aerial systems, but it is incremental as it builds on prior CRL algorithms with specific enhancements.

The paper tackles motion control challenges for a direct-drive tandem-wing platform under multiple random operating conditions by developing the CRL2E algorithm, which achieves safe training within 500 steps and improves tracking accuracy by 14 to 66 times compared to baseline algorithms like Soft Actor-Critic.

The nonlinear and unstable aerodynamic interference generated by the tandem wings of such biomimetic systems poses substantial challenges for motion control, especially under multiple random operating conditions. To address these challenges, the Concerto Reinforcement Learning Extension (CRL2E) algorithm has been developed. This plug-and-play, fully on-the-job, real-time reinforcement learning algorithm incorporates a novel Physics-Inspired Rule-Based Policy Composer Strategy with a Perturbation Module alongside a lightweight network optimized for real-time control. To validate the performance and the rationality of the module design, experiments were conducted under six challenging operating conditions, comparing seven different algorithms. The results demonstrate that the CRL2E algorithm achieves safe and stable training within the first 500 steps, improving tracking accuracy by 14 to 66 times compared to the Soft Actor-Critic, Proximal Policy Optimization, and Twin Delayed Deep Deterministic Policy Gradient algorithms. Additionally, CRL2E significantly enhances performance under various random operating conditions, with improvements in tracking accuracy ranging from 8.3% to 60.4% compared to the Concerto Reinforcement Learning (CRL) algorithm. The convergence speed of CRL2E is 36.11% to 57.64% faster than the CRL algorithm with only the Composer Perturbation and 43.52% to 65.85% faster than the CRL algorithm when both the Composer Perturbation and Time-Interleaved Capability Perturbation are introduced, especially in conditions where the standard CRL struggles to converge. Hardware tests indicate that the optimized lightweight network structure excels in weight loading and average inference time, meeting real-time control requirements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes