Blending Low and High-Level Semantics of Time Series for Better Masked Time Series Generation
This work addresses the problem of time series generation for applications requiring realistic synthetic data, but it is incremental as it builds on existing vector quantization-based methods.
The paper tackles the problem of generating high-fidelity time series by addressing the limitation of existing methods that capture only low-level semantics, introducing NC-VQVAE to integrate self-supervised learning for capturing both low and high-level semantics, resulting in a considerable improvement in synthetic sample quality.
State-of-the-art approaches in time series generation (TSG), such as TimeVQVAE, utilize vector quantization-based tokenization to effectively model complex distributions of time series. These approaches first learn to transform time series into a sequence of discrete latent vectors, and then a prior model is learned to model the sequence. The discrete latent vectors, however, only capture low-level semantics (\textit{e.g.,} shapes). We hypothesize that higher-fidelity time series can be generated by training a prior model on more informative discrete latent vectors that contain both low and high-level semantics (\textit{e.g.,} characteristic dynamics). In this paper, we introduce a novel framework, termed NC-VQVAE, to integrate self-supervised learning into those TSG methods to derive a discrete latent space where low and high-level semantics are captured. Our experimental results demonstrate that NC-VQVAE results in a considerable improvement in the quality of synthetic samples.