LGAICEDec 29, 2024

MATEY: multiscale adaptive foundation models for spatiotemporal physical systems

arXiv:2412.20601v14 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses computational bottlenecks in applying vision transformers to physical systems, offering incremental improvements in efficiency and accuracy for domain-specific tasks.

The paper tackled the problem of representing multiscale features in spatiotemporal physical systems with vision transformers by proposing adaptive tokenization and spatiotemporal attention schemes, resulting in improved accuracy without significantly increasing token sequence length and showing pretrained models outperform scratch-trained ones in low-data regimes.

Accurate representation of the multiscale features in spatiotemporal physical systems using vision transformer (ViT) architectures requires extremely long, computationally prohibitive token sequences. To address this issue, we propose two adaptive tokenization schemes that dynamically adjust patch sizes based on local features: one ensures convergent behavior to uniform patch refinement, while the other offers better computational efficiency. Moreover, we present a set of spatiotemporal attention schemes, where the temporal or axial spatial dimensions are decoupled, and evaluate their computational and data efficiencies. We assess the performance of the proposed multiscale adaptive model, MATEY, in a sequence of experiments. The results show that adaptive tokenization schemes achieve improved accuracy without significantly increasing the length of the token sequence. Compared to a full spatiotemporal attention scheme or a scheme that decouples only the temporal dimension, we find that fully decoupled axial attention is less efficient and expressive, requiring more training time and model weights to achieve the same accuracy. Finally, we demonstrate in two fine-tuning tasks featuring different physics that models pretrained on PDEBench data outperform the ones trained from scratch, especially in the low data regime with frozen attention.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes