BlockGPT: Spatio-Temporal Modelling of Rainfall via Frame-Level Autoregression
This work addresses the need for accurate and computationally efficient nowcasting models to mitigate extreme weather impacts, representing an incremental improvement over existing methods.
The paper tackles the problem of short-term precipitation forecasting (nowcasting) by introducing BlockGPT, a generative autoregressive transformer that predicts full 2D frames at each time step, achieving superior accuracy, better event localization, and inference speeds up to 31 times faster than state-of-the-art baselines on datasets like KNMI and SEVIR.
Predicting precipitation maps is a highly complex spatiotemporal modeling task, critical for mitigating the impacts of extreme weather events. Short-term precipitation forecasting, or nowcasting, requires models that are not only accurate but also computationally efficient for real-time applications. Current methods, such as token-based autoregressive models, often suffer from flawed inductive biases and slow inference, while diffusion models can be computationally intensive. To address these limitations, we introduce BlockGPT, a generative autoregressive transformer using batched tokenization (Block) method that predicts full two-dimensional fields (frames) at each time step. Conceived as a model-agnostic paradigm for video prediction, BlockGPT factorizes space-time by using self-attention within each frame and causal attention across frames; in this work, we instantiate it for precipitation nowcasting. We evaluate BlockGPT on two precipitation datasets, viz. KNMI (Netherlands) and SEVIR (U.S.), comparing it to state-of-the-art baselines including token-based (NowcastingGPT) and diffusion-based (DiffCast+Phydnet) models. The results show that BlockGPT achieves superior accuracy, event localization as measured by categorical metrics, and inference speeds up to 31x faster than comparable baselines.