Generalised learning of time-series: Ornstein-Uhlenbeck processes
This addresses a methodological challenge for researchers and practitioners in machine learning, statistics, and related fields, offering an incremental improvement over standard cross-validation techniques for time-series analysis.
The authors tackled the problem of applying cross-validation to time-series data without losing serial correlations or violating temporal ordering, and proposed a meta-algorithm called reconstructive cross-validation (rCV) that avoids these issues by generating partial time-series with imputation and evaluating models on removed and out-of-sample data.
In machine learning, statistics, econometrics and statistical physics, cross-validation (CV) is used asa standard approach in quantifying the generalisation performance of a statistical model. A directapplication of CV in time-series leads to the loss of serial correlations, a requirement of preserving anynon-stationarity and the prediction of the past data using the future data. In this work, we proposea meta-algorithm called reconstructive cross validation (rCV ) that avoids all these issues. At first,k folds are formed with non-overlapping randomly selected subsets of the original time-series. Then,we generate k new partial time-series by removing data points from a given fold: every new partialtime-series have missing points at random from a different entire fold. A suitable imputation or asmoothing technique is used to reconstruct k time-series. We call these reconstructions secondarymodels. Thereafter, we build the primary k time-series models using new time-series coming fromthe secondary models. The performance of the primary models are evaluated simultaneously bycomputing the deviations from the originally removed data points and out-of-sample (OSS) data.Full cross-validation in time-series models can be practiced with rCV along with generating learning curves.