S$^2$Transformer: Scalable Structured Transformers for Global Station Weather Forecasting
This work addresses global station weather forecasting, which is critical for energy, aviation, and agriculture, by providing a scalable and structured model that improves forecast accuracy, though it is incremental as it builds on existing transformer and attention mechanisms.
The paper tackled the problem of global station weather forecasting by addressing the neglect of spatial correlation in existing methods, proposing a novel Spatial Structured Attention Block that partitions spatial graphs and uses intra- and inter-subgraph attention to model local and global correlations, resulting in performance improvements up to 16.8% over baselines at low running costs.
Global Station Weather Forecasting (GSWF) is a key meteorological research area, critical to energy, aviation, and agriculture. Existing time series forecasting methods often ignore or unidirectionally model spatial correlation when conducting large-scale global station forecasting. This contradicts the intrinsic nature underlying observations of the global weather system, limiting forecast performance. To address this, we propose a novel Spatial Structured Attention Block in this paper. It partitions the spatial graph into a set of subgraphs and instantiates Intra-subgraph Attention to learn local spatial correlation within each subgraph, and aggregates nodes into subgraph representations for message passing among the subgraphs via Inter-subgraph Attention -- considering both spatial proximity and global correlation. Building on this block, we develop a multiscale spatiotemporal forecasting model S$^2$Transformer by progressively expanding subgraph scales. The resulting model is both scalable and able to produce structured spatial correlation, and meanwhile, it is easy to implement. The experimental results show that it can achieve performance improvements up to 16.8% over time series forecasting baselines at low running costs.