LG AI MLDec 9, 2024

Measuring Pre-training Data Quality without Labels for Time Series Foundation Models

Songkang Wen, Vasilii Feofanov, Jianfeng Zhang

arXiv:2412.06368v14.62 citationsh-index: 16

Originality Incremental advance

AI Analysis

This work addresses a key challenge in building better time series foundation models by providing a method to select high-quality pre-training data, though it is incremental as it builds on existing contrastive learning approaches.

The paper tackled the problem of evaluating pre-training data quality for time series foundation models without labels, introducing contrastive accuracy as a measure that correlates with downstream task accuracy.

Recently, there has been a growing interest in time series foundation models that generalize across different downstream tasks. A key to strong foundation models is a diverse pre-training dataset, which is particularly challenging to collect for time series classification. In this work, we explore the performance of a contrastive-learning-based foundation model as a function of the data used for pre-training. We introduce contrastive accuracy, a new measure to evaluate the quality of the representation space learned by the foundation model. Our experiments reveal the positive correlation between the proposed measure and the accuracy of the model on a collection of downstream tasks. This suggests that the contrastive accuracy can serve as a criterion to search for time series datasets that can enhance the pre-training and improve thereby the foundation model's generalization.

View on arXiv PDF

Similar