AIApr 16, 2024

Understanding Token-level Topological Structures in Transformer-based Time Series Forecasting

Jianqi Zhang, Wenwen Qiang, Jingyao Wang, Jiahuan Zhou, Changwen Zheng, Hui Xiong

arXiv:2404.10337v47.32 citationsh-index: 13Has Code

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in time series forecasting for researchers and practitioners, offering an incremental enhancement to Transformer models.

The paper tackles the problem of Transformer-based time series forecasting by identifying that existing architectures degrade token-level topological structures, limiting accuracy, and proposes a plug-and-play method (TEM) that improves predictive performance when integrated with various existing methods.

Transformer-based methods have achieved state-of-the-art performance in time series forecasting (TSF) by capturing positional and semantic topological relationships among input tokens. However, it remains unclear whether existing Transformers fully leverage the intrinsic topological structure among tokens throughout intermediate layers. Through empirical and theoretical analyses, we identify that current Transformer architectures progressively degrade the original positional and semantic topology of input tokens as the network deepens, thus limiting forecasting accuracy. Furthermore, our theoretical results demonstrate that explicitly enforcing preservation of these topological structures within intermediate layers can tighten generalization bounds, leading to improved forecasting performance. Motivated by these insights, we propose the Topology Enhancement Method (TEM), a novel Transformer-based TSF method that explicitly and adaptively preserves token-level topology. TEM consists of two core modules: 1) the Positional Topology Enhancement Module (PTEM), which injects learnable positional constraints to explicitly retain original positional topology; 2) the Semantic Topology Enhancement Module (STEM), which incorporates a learnable similarity matrix to preserve original semantic topology. To determine optimal injection weights adaptively, TEM employs a bi-level optimization strategy. The proposed TEM is a plug-and-play method that can be integrated with existing Transformer-based TSF methods. Extensive experiments demonstrate that integrating TEM with a variety of existing methods significantly improves their predictive performance, validating the effectiveness of explicitly preserving original token-level topology. Our code is publicly available at: \href{https://github.com/jlu-phyComputer/TEM}{https://github.com/jlu-phyComputer/TEM}.

View on arXiv PDF Code

Similar