Harry Holt

h-index4
2papers

2 Papers

SYSep 30, 2024
Certifying Guidance & Control Networks: Uncertainty Propagation to an Event Manifold

Sebastien Origer, Dario Izzo, Giacomo Acciarini et al.

We perform uncertainty propagation on an event manifold for Guidance & Control Networks (G&CNETs), aiming to enhance the certification tools for neural networks in this field. This work utilizes three previously solved optimal control problems with varying levels of dynamics nonlinearity and event manifold complexity. The G&CNETs are trained to represent the optimal control policies of a time-optimal interplanetary transfer, a mass-optimal landing on an asteroid and energy-optimal drone racing, respectively. For each of these problems, we describe analytically the terminal conditions on an event manifold with respect to initial state uncertainties. Crucially, this expansion does not depend on time but solely on the initial conditions of the system, thereby making it possible to study the robustness of the G&CNET at any specific stage of a mission defined by the event manifold. Once this analytical expression is found, we provide confidence bounds by applying the Cauchy-Hadamard theorem and perform uncertainty propagation using moment generating functions. While Monte Carlo-based (MC) methods can yield the results we present, this work is driven by the recognition that MC simulations alone may be insufficient for future certification of neural networks in guidance and control applications.

SYJul 22, 2025
Comparing Behavioural Cloning and Reinforcement Learning for Spacecraft Guidance and Control Networks

Harry Holt, Sebastien Origer, Dario Izzo

Guidance & control networks (G&CNETs) provide a promising alternative to on-board guidance and control (G&C) architectures for spacecraft, offering a differentiable, end-to-end representation of the guidance and control architecture. When training G&CNETs, two predominant paradigms emerge: behavioural cloning (BC), which mimics optimal trajectories, and reinforcement learning (RL), which learns optimal behaviour through trials and errors. Although both approaches have been adopted in G&CNET related literature, direct comparisons are notably absent. To address this, we conduct a systematic evaluation of BC and RL specifically for training G&CNETs on continuous-thrust spacecraft trajectory optimisation tasks. We introduce a novel RL training framework tailored to G&CNETs, incorporating decoupled action and control frequencies alongside reward redistribution strategies to stabilise training and to provide a fair comparison. Our results show that BC-trained G&CNETs excel at closely replicating expert policy behaviour, and thus the optimal control structure of a deterministic environment, but can be negatively constrained by the quality and coverage of the training dataset. In contrast RL-trained G&CNETs, beyond demonstrating a superior adaptability to stochastic conditions, can also discover solutions that improve upon suboptimal expert demonstrations, sometimes revealing globally optimal strategies that eluded the generation of training samples.