LG AIAug 4, 2025

CauKer: classification time series foundation models can be pretrained on synthetic data only

Shifeng Xie, Vasilii Feofanov, Marius Alonso, Ambroise Odonnat, Jianfeng Zhang, Themis Palpanas, Ievgen Redko

arXiv:2508.02879v221.39 citationsh-index: 56

Originality Highly original

AI Analysis

This addresses the challenge of reducing pretraining costs for TSFMs, which is incremental as it builds on existing TSFM methods by introducing synthetic data generation.

The paper tackles the problem of computationally costly pretraining of time series foundation models (TSFMs) by proposing CauKer, an algorithm that generates diverse, causally coherent synthetic time series, enabling sample-efficient pretraining and revealing clear scaling laws from 10K to 10M samples and 1M to 783M parameters.

Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pretraining on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pretraining of TSFMs, we propose CauKer, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. CauKer combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pretraining of state-of-the-art classification TSFMs having different architectures and following different pretraining approaches. Additionally, our experiments reveal that CauKer-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior.

View on arXiv PDF

Similar