LG MLApr 29, 2024

Time Series Data Augmentation as an Imbalanced Learning Problem

Vitor Cerqueira, Nuno Moniz, Ricardo Inácio, Carlos Soares

arXiv:2404.18537v16.43 citationsh-index: 14Has CodeEPIA

Originality Incremental advance

AI Analysis

This addresses the data scarcity issue in time series forecasting for practitioners, offering an incremental improvement by adapting existing imbalanced learning techniques to a new domain.

The paper tackles the problem of insufficient data for training global forecasting models by proposing a novel data augmentation method that frames time series forecasting as an imbalanced learning task, using oversampling to generate synthetic samples. The result shows that the method outperforms both global and local models across 7 databases with 5502 time series, providing a better trade-off.

Recent state-of-the-art forecasting methods are trained on collections of time series. These methods, often referred to as global models, can capture common patterns in different time series to improve their generalization performance. However, they require large amounts of data that might not be readily available. Besides this, global models sometimes fail to capture relevant patterns unique to a particular time series. In these cases, data augmentation can be useful to increase the sample size of time series datasets. The main contribution of this work is a novel method for generating univariate time series synthetic samples. Our approach stems from the insight that the observations concerning a particular time series of interest represent only a small fraction of all observations. In this context, we frame the problem of training a forecasting model as an imbalanced learning task. Oversampling strategies are popular approaches used to deal with the imbalance problem in machine learning. We use these techniques to create synthetic time series observations and improve the accuracy of forecasting models. We carried out experiments using 7 different databases that contain a total of 5502 univariate time series. We found that the proposed solution outperforms both a global and a local model, thus providing a better trade-off between these two approaches.

View on arXiv PDF Code

Similar