Beyond Holistic Models: Systematic Component-level Benchmarking of Deep Multivariate Time-Series Forecasting
For researchers in time-series forecasting, this work provides a systematic framework to understand and optimize component-level contributions, moving beyond holistic model comparisons.
This paper introduces TSCOMP, a large-scale benchmark that deconstructs deep multivariate time-series forecasting methods into fine-grained components, evaluating over 20,000 model-dataset combinations. The resulting corpus-driven approach for automated component selection consistently outperforms state-of-the-art methods.
While previous research in multivariate time series forecasting has focused on developing complex holistic models, this work advocates for a shift toward a granular, component-level understanding of their impacts. We propose TSCOMP, the first large-scale benchmark that systematically deconstructs deep forecasting methods into their core, fine-grained components--spanning series preprocessing, encoding strategies, network architectures including specific and large time-series models, and optimization methods. Using constrained orthogonal experimental design and extensive evaluations, we conduct multi-view analyses that reveal component effectiveness across different backbones, data characteristics, and their interactions. Beyond providing insights, this benchmark establishes a fine-grained performance corpus comprising over 20,000 model-dataset evaluations, which supports the learning of automated component selection, enabling zero-shot model construction on new datasets. Our experiments demonstrate that the corpus-driven approach, despite its simplicity, consistently outperforms state-of-the-art methods, validating the soundness of our evaluation design and confirming that systematic component selection surpasses manually designed complex architectures. All code and the performance corpus are publicly available at https://github.com/SUFE-AILAB/TSCOMP.