LGMar 14, 2025

Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models

arXiv:2503.11411v113 citationsh-index: 35
Originality Synthesis-oriented
AI Analysis

This is an incremental survey that synthesizes existing research on synthetic data to support the development of time series analysis models, benefiting researchers and practitioners in fields like finance, healthcare, and climate science.

This survey addresses the challenge of obtaining large, diverse, and high-quality datasets for time series foundation models (TSFMs) and large language model-based time series models (TSLLMs) by reviewing synthetic data as a scalable and unbiased solution, analyzing its role in model pretraining, fine-tuning, and evaluation.

Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs), enabling generalized learning and integrating contextual information. However, their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. Synthetic data emerge as a viable solution, addressing these challenges by offering scalable, unbiased, and high-quality alternatives. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes