LGAIMay 8, 2023

Mlinear: Rethink the Linear Model for Time-series Forecasting

arXiv:2305.04800v27 citations
Originality Incremental advance
AI Analysis

This work addresses a key challenge in time-series forecasting for researchers and practitioners by synergizing opposing data properties, though it is incremental as it builds on existing linear and Transformer methods.

The paper tackles the problem of effectively combining channel-independence and channel-dependence properties in time-series forecasting, proposing Mlinear, a linear-based method that dynamically tunes these properties and uses deep supervision. It significantly outperforms PatchTST, a Transformer-based method, with ratios of 21:3 and 29:10 on MSE and MAE metrics across 7 datasets and offers a 10× efficiency advantage.

Recently, significant advancements have been made in time-series forecasting research, with an increasing focus on analyzing the nature of time-series data, e.g, channel-independence (CI) and channel-dependence (CD), rather than solely focusing on designing sophisticated forecasting models. However, current research has primarily focused on either CI or CD in isolation, and the challenge of effectively combining these two opposing properties to achieve a synergistic effect remains an unresolved issue. In this paper, we carefully examine the opposing properties of CI and CD, and raise a practical question that has not been effectively answered, e.g.,"How to effectively mix the CI and CD properties of time series to achieve better predictive performance?" To answer this question, we propose Mlinear (MIX-Linear), a simple yet effective method based mainly on linear layers. The design philosophy of Mlinear mainly includes two aspects:(1) dynamically tuning the CI and CD properties based on the time semantics of different input time series, and (2) providing deep supervision to adjust the individual performance of the "CI predictor" and "CD predictor". In addition, empirically, we introduce a new loss function that significantly outperforms the widely used mean squared error (MSE) on multiple datasets. Experiments on time-series datasets covering multiple fields and widely used have demonstrated the superiority of our method over PatchTST which is the lateset Transformer-based method in terms of the MSE and MAE metrics on 7 datasets with identical sequence inputs (336 or 512). Specifically, our method significantly outperforms PatchTST with a ratio of 21:3 at 336 sequence length input and 29:10 at 512 sequence length input. Additionally, our approach has a 10 $\times$ efficiency advantage at the unit level, taking into account both training and inference times.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes