Data Lifecycle Management in Evolving Input Distributions for Learning-based Aerospace Applications
This addresses the challenge of model adaptation in aerospace applications like satellite pose estimation, where input distributions change, but it is incremental as it builds on existing active learning and retraining methods.
The paper tackles the problem of maintaining learning-based model performance as input distributions evolve over a mission lifetime, by proposing a framework for incremental retraining with selective labeling. It shows that their novel algorithm achieves comparable performance to labeling all inputs while only labeling 50% of inputs, reducing costs and maintaining high performance.
As input distributions evolve over a mission lifetime, maintaining performance of learning-based models becomes challenging. This paper presents a framework to incrementally retrain a model by selecting a subset of test inputs to label, which allows the model to adapt to changing input distributions. Algorithms within this framework are evaluated based on (1) model performance throughout mission lifetime and (2) cumulative costs associated with labeling and model retraining. We provide an open-source benchmark of a satellite pose estimation model trained on images of a satellite in space and deployed in novel scenarios (e.g., different backgrounds or misbehaving pixels), where algorithms are evaluated on their ability to maintain high performance by retraining on a subset of inputs. We also propose a novel algorithm to select a diverse subset of inputs for labeling, by characterizing the information gain from an input using Bayesian uncertainty quantification and choosing a subset that maximizes collective information gain using concepts from batch active learning. We show that our algorithm outperforms others on the benchmark, e.g., achieves comparable performance to an algorithm that labels 100% of inputs, while only labeling 50% of inputs, resulting in low costs and high performance over the mission lifetime.