LGMar 8, 2024

Shielded Deep Reinforcement Learning for Complex Spacecraft Tasking

arXiv:2403.05693v36 citationsh-index: 3ACC
AI Analysis

This work addresses safety and goal ambiguity in autonomous spacecraft control, offering incremental improvements by formalizing existing methods.

The paper tackles the problem of informal task and safety definitions in Shielded Deep Reinforcement Learning for spacecraft control by formalizing tasks and safety requirements using Linear Temporal Logic, automatically constructing reward functions, and designing shields with probabilistic guarantees. The result is demonstrated through experiments showing the interaction of shields with policies and the flexibility of the reward structure.

Autonomous spacecraft control via Shielded Deep Reinforcement Learning (SDRL) has become a rapidly growing research area. However, the construction of shields and the definition of tasking remains informal, resulting in policies with no guarantees on safety and ambiguous goals for the RL agent. In this paper, we first explore the use of formal languages, namely Linear Temporal Logic (LTL), to formalize spacecraft tasks and safety requirements. We then define a manner in which to construct a reward function from a co-safe LTL specification automatically for effective training in SDRL framework. We also investigate methods for constructing a shield from a safe LTL specification for spacecraft applications and propose three designs that provide probabilistic guarantees. We show how these shields interact with different policies and the flexibility of the reward structure through several experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes