Unified Long-Term Time-Series Forecasting Benchmark
This provides a standardized benchmark for researchers in time-series forecasting, though it is incremental as it builds on existing datasets and models.
The authors tackled the lack of standardized evaluation for long-term time-series forecasting by creating a comprehensive benchmark dataset with trajectories up to 2000 steps, and found that custom models like latent NLinear and curriculum-enhanced DeepAR consistently outperformed vanilla versions in diverse scenarios.
In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.