LGJun 26, 2025

Distributed Cross-Channel Hierarchical Aggregation for Foundation Models

Aristeidis Tsaris, Isaac Lyngaas, John Lagregren, Mohamed Wahib, Larry York, Prasanna Balaprakash, Dan Lu, Feiyi Wang, Xiao Wang

arXiv:2506.21411v17.12 citationsh-index: 20SC

Originality Incremental advance

AI Analysis

This addresses efficiency bottlenecks for researchers and practitioners using large-scale vision transformers in domains like hyperspectral imaging and weather forecasting, representing an incremental improvement over existing distributed methods.

The paper tackles the compute-intensive challenge of tokenizing and aggregating images in vision-based scientific foundation models by introducing the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach, which achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 GPUs.

Vision-based scientific foundation models hold significant promise for advancing scientific discovery and innovation. This potential stems from their ability to aggregate images from diverse sources such as varying physical groundings or data acquisition systems and to learn spatio-temporal correlations using transformer architectures. However, tokenizing and aggregating images can be compute-intensive, a challenge not fully addressed by current distributed methods. In this work, we introduce the Distributed Cross-Channel Hierarchical Aggregation (D-CHAG) approach designed for datasets with a large number of channels across image modalities. Our method is compatible with any model-parallel strategy and any type of vision transformer architecture, significantly improving computational efficiency. We evaluated D-CHAG on hyperspectral imaging and weather forecasting tasks. When integrated with tensor parallelism and model sharding, our approach achieved up to a 75% reduction in memory usage and more than doubled sustained throughput on up to 1,024 AMD GPUs on the Frontier Supercomputer.

View on arXiv PDF

Similar