LG EP IM ETAug 7, 2025

A Markov Decision Process Framework for Early Maneuver Decisions in Satellite Collision Avoidance

Francesca Ferrara, Lander W. Schillinger Arana, Florian Dörfler, Sarah H. Q. Li

arXiv:2508.05876v14.1h-index: 3

Originality Incremental advance

AI Analysis

This work addresses fuel efficiency in satellite operations, but it is incremental as it builds on existing MDP and reinforcement learning techniques for a specific domain.

The authors tackled the problem of optimizing early maneuver decisions for satellite collision avoidance by developing a Markov decision process framework and reinforcement learning policy, which minimized average fuel consumption while maintaining collision risk guarantees compared to conventional methods.

This work presents a Markov decision process (MDP) framework to model decision-making for collision avoidance maneuver (CAM) and a reinforcement learning policy gradient (RL-PG) algorithm to train an autonomous guidance policy using historic CAM data. In addition to maintaining acceptable collision risks, this approach seeks to minimize the average fuel consumption of CAMs by making early maneuver decisions. We model CAM as a continuous state, discrete action and finite horizon MDP, where the critical decision is determining when to initiate the maneuver. The MDP model also incorporates analytical models for conjunction risk, propellant consumption, and transit orbit geometry. The Markov policy effectively trades-off maneuver delay-which improves the reliability of conjunction risk indicators-with propellant consumption-which increases with decreasing maneuver time. Using historical data of tracked conjunction events, we verify this framework and conduct an extensive ablation study on the hyper-parameters used within the MDP. On synthetic conjunction events, the trained policy significantly minimizes both the overall and average propellant consumption per CAM when compared to a conventional cut-off policy that initiates maneuvers 24 hours before the time of closest approach (TCA). On historical conjunction events, the trained policy consumes more propellant overall but reduces the average propellant consumption per CAM. For both historical and synthetic conjunction events, the trained policy achieves equal if not higher overall collision risk guarantees.

View on arXiv PDF

Similar