LGJun 3

REGEN: Reference-Guided Synthetic Multivariate Time Series Generation for Forecasting

Moulik Gupta, Dhruv Kumar, Murari Mandal, Saurabh Deshpande

arXiv:2606.0526439.3

AI Analysis

For practitioners in domains with limited multivariate time series data, ReGeN provides a method to generate high-quality synthetic data that preserves domain-specific structure, improving forecasting model training.

ReGeN generates synthetic multivariate time series by decomposing reference sequences into periodic backbones, stochastic residuals, and cross-variable dependencies, enabling controllable synthesis that preserves domain structure. In low-data regimes, ReGeN-generated data substitutes for real data with minimal forecasting degradation and, in strongly periodic domains like traffic, can outperform real data; a foundation model pretrained on ReGeN corpora outperforms those trained on prior-based and data-driven synthetic alternatives.

Training robust multivariate time series forecasting models requires large, diverse corpora, yet many real-world domains provide only a handful of observed sequences. Existing generators fail to resolve this mismatch: prior-based approaches (e.g., CauKer, TimePFN) produce domain-agnostic samples, while data-driven methods (e.g., TimeGAN) treat references as black-box supervision, forfeiting explicit control over periodic structure, local variability, and cross-variable dynamics. We propose ReGeN, a reference-guided generative pipeline that treats observed sequences not as examples to imitate, but as structural scaffolds for controllable synthesis. ReGeN decomposes each reference into three interpretable components: a phase-aligned periodic backbone capturing dominant domain morphology; per-variable stochastic residuals modeled with a deep-kernel Gaussian process; and lag-aware cross-variable dependencies injected through a structural causal model with fitted coupling coefficients. Sampling these components at controllable temperature broadens distributional coverage while preserving domain-grounded structure. We show that ReGeN-generated data consistently substitutes for real sibling data with minimal forecasting degradation, and in strongly periodic domains such as traffic, can outperform the real source itself. We further show that a foundation model pretrained on ReGeN corpora outperforms those pretrained on prior-based and data-driven synthetic alternatives. This suggests that in low-data regimes, how reference data is structurally exploited can matter as much as how much data is available.

View on arXiv PDF

Similar