Smart Commander: A Hierarchical Reinforcement Learning Framework for Fleet-Level PHM Decision Optimization
This addresses the challenge of fleet-level decision-making in military aviation PHM, offering a novel HRL framework that is incremental in improving upon existing DRL methods for this specific domain.
The paper tackles the problem of optimizing sequential maintenance and logistics decisions in military aviation Prognostics and Health Management (PHM) for large-scale fleet operations, proposing Smart Commander, a Hierarchical Reinforcement Learning (HRL) framework that significantly outperforms conventional monolithic Deep Reinforcement Learning (DRL) and rule-based baselines, achieving a substantial reduction in training time with superior scalability and robustness.
Decision-making in military aviation Prognostics and Health Management (PHM) faces significant challenges due to the "curse of dimensionality" in large-scale fleet operations, combined with sparse feedback and stochastic mission profiles. To address these issues, this paper proposes Smart Commander, a novel Hierarchical Reinforcement Learning (HRL) framework designed to optimize sequential maintenance and logistics decisions. The framework decomposes the complex control problem into a two-tier hierarchy: a strategic General Commander manages fleet-level availability and cost objectives, while tactical Operation Commanders execute specific actions for sortie generation, maintenance scheduling, and resource allocation. The proposed approach is validated within a custom-built, high-fidelity discrete-event simulation environment that captures the dynamics of aircraft configuration and support logistics.By integrating layered reward shaping with planning-enhanced neural networks, the method effectively addresses the difficulty of sparse and delayed rewards. Empirical evaluations demonstrate that Smart Commander significantly outperforms conventional monolithic Deep Reinforcement Learning (DRL) and rule-based baselines. Notably, it achieves a substantial reduction in training time while demonstrating superior scalability and robustness in failure-prone environments. These results highlight the potential of HRL as a reliable paradigm for next-generation intelligent fleet management.