MissFormer: (In-)attention-based handling of missing observations for trajectory filtering and prediction
This addresses a key limitation in deep learning-based tracking models, which often cannot handle missing data, making it useful for applications like object tracking where data gaps are common.
The paper tackles the problem of missing observations in trajectory data for object tracking by introducing a transformer-based model that learns to infer complete trajectories from noisy inputs with missing data, demonstrating its abilities on both synthetic and real-world datasets.
In applications such as object tracking, time-series data inevitably carry missing observations. Following the success of deep learning-based models for various sequence learning tasks, these models increasingly replace classic approaches in object tracking applications for inferring the objects' motion states. While traditional tracking approaches can deal with missing observations, most of their deep counterparts are, by default, not suited for this. Towards this end, this paper introduces a transformer-based approach for handling missing observations in variable input length trajectory data. The model is formed indirectly by successively increasing the complexity of the demanded inference tasks. Starting from reproducing noise-free trajectories, the model then learns to infer trajectories from noisy inputs. By providing missing tokens, binary-encoded missing events, the model learns to in-attend to missing data and infers a complete trajectory conditioned on the remaining inputs. In the case of a sequence of successive missing events, the model then acts as a pure prediction model. The abilities of the approach are demonstrated on synthetic data and real-world data reflecting prototypical object tracking scenarios.