UTICA: Multi-Objective Self-Distllation Foundation Model Pretraining for Time Series Classification
This work addresses the need for effective pretraining strategies in time series analysis, though it is incremental as it adapts existing methods from computer vision to a new domain.
The paper tackled the problem of adapting non-contrastive self-distillation methods, like DINOv2, to pretrain a time series foundation model, achieving state-of-the-art classification performance on UCR and UEA benchmarks.
Self-supervised foundation models have achieved remarkable success across domains, including time series. However, the potential of non-contrastive methods, a paradigm that has driven significant advances in computer vision, remains underexplored for time series. In this work, we adapt DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as our backbone. Through a student-teacher framework, our method Utica learns representations that capture both temporal invariance via augmented crops and fine-grained local structure via patch masking. Our approach achieves state-of-the-art classification performance on both UCR and UEA benchmarks. These results suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.