IMAug 6, 2024
Spacecraft inertial parameters estimation using time series clustering and reinforcement learningKonstantinos Platanitis, Miguel Arana-Catania, Leonardo Capicchiano et al.
This paper presents a machine learning approach to estimate the inertial parameters of a spacecraft in cases when those change during operations, e.g. multiple deployments of payloads, unfolding of appendages and booms, propellant consumption as well as during in-orbit servicing and active debris removal operations. The machine learning approach uses time series clustering together with an optimised actuation sequence generated by reinforcement learning to facilitate distinguishing among different inertial parameter sets. The performance of the proposed strategy is assessed against the case of a multi-satellite deployment system showing that the algorithm is resilient towards common disturbances in such kinds of operations.
SYJan 21, 2025
A causal learning approach to in-orbit inertial parameter estimation for multi-payload deployersKonstantinos Platanitis, Miguel Arana-Catania, Saurabh Upadhyay et al.
This paper discusses an approach to inertial parameter estimation for the case of cargo carrying spacecraft that is based on causal learning, i.e. learning from the responses of the spacecraft, under actuation. Different spacecraft configurations (inertial parameter sets) are simulated under different actuation profiles, in order to produce an optimised time-series clustering classifier that can be used to distinguish between them. The actuation is comprised of finite sequences of constant inputs that are applied in order, based on typical actuators available. By learning from the system's responses across multiple input sequences, and then applying measures of time-series similarity and F1-score, an optimal actuation sequence can be chosen either for one specific system configuration or for the overall set of possible configurations. This allows for both estimation of the inertial parameter set without any prior knowledge of state, as well as validation of transitions between different configurations after a deployment event. The optimisation of the actuation sequence is handled by a reinforcement learning model that uses the proximal policy optimisation (PPO) algorithm, by repeatedly trying different sequences and evaluating the impact on classifier performance according to a multi-objective metric.