LGAIDec 1, 2021

Homotopy Based Reinforcement Learning with Maximum Entropy for Autonomous Air Combat

arXiv:2112.01328v1
Originality Incremental advance
AI Analysis

This addresses real-time decision-making problems for unmanned combat aerial vehicles, though it is incremental as it builds on existing RL methods.

The paper tackles the challenge of sparse rewards and suboptimal convergence in reinforcement learning for autonomous air combat by proposing a homotopy-based soft actor-critic method (HSAC), which achieves over 98.3% win rate in an attack task and averages 67.4% win rate in self-play confrontations.

The Intelligent decision of the unmanned combat aerial vehicle (UCAV) has long been a challenging problem. The conventional search method can hardly satisfy the real-time demand during high dynamics air combat scenarios. The reinforcement learning (RL) method can significantly shorten the decision time via using neural networks. However, the sparse reward problem limits its convergence speed and the artificial prior experience reward can easily deviate its optimal convergent direction of the original task, which raises great difficulties for the RL air combat application. In this paper, we propose a homotopy-based soft actor-critic method (HSAC) which focuses on addressing these problems via following the homotopy path between the original task with sparse reward and the auxiliary task with artificial prior experience reward. The convergence and the feasibility of this method are also proved in this paper. To confirm our method feasibly, we construct a detailed 3D air combat simulation environment for the RL-based methods training firstly, and we implement our method in both the attack horizontal flight UCAV task and the self-play confrontation task. Experimental results show that our method performs better than the methods only utilizing the sparse reward or the artificial prior experience reward. The agent trained by our method can reach more than 98.3% win rate in the attack horizontal flight UCAV task and average 67.4% win rate when confronted with the agents trained by the other two methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes