Conv-like Scale-Fusion Time Series Transformer: A Multi-Scale Representation for Variable-Length Long Time Series
This work addresses generalization issues in time series analysis for applications like forecasting and classification, representing an incremental improvement by integrating CNN-inspired structures into Transformers.
The paper tackles the challenges of variable-length time series data and feature redundancy in Transformer models by proposing a Conv-like Scale-Fusion Transformer with multi-scale representation learning, achieving superior performance in forecasting and classification tasks compared to state-of-the-art methods.
Time series analysis faces significant challenges in handling variable-length data and achieving robust generalization. While Transformer-based models have advanced time series tasks, they often struggle with feature redundancy and limited generalization capabilities. Drawing inspiration from classical CNN architectures' pyramidal structure, we propose a Multi-Scale Representation Learning Framework based on a Conv-like ScaleFusion Transformer. Our approach introduces a temporal convolution-like structure that combines patching operations with multi-head attention, enabling progressive temporal dimension compression and feature channel expansion. We further develop a novel cross-scale attention mechanism for effective feature fusion across different temporal scales, along with a log-space normalization method for variable-length sequences. Extensive experiments demonstrate that our framework achieves superior feature independence, reduced redundancy, and better performance in forecasting and classification tasks compared to state-of-the-art methods.