LG AIMay 13, 2024

USP: A Unified Sequence Parallelism Approach for Long Context Generative AI

arXiv:2405.07719v528.554 citationsh-index: 2Has Code

Originality Incremental advance

AI Analysis

This work addresses the problem of efficiently training large generative models with long sequences for AI researchers and practitioners, representing an incremental improvement over existing methods.

The paper tackles the challenge of enabling long-context capabilities in generative AI models by proposing a unified sequence parallelism approach that is robust to transformer architectures and network hardware, achieving 47% MFU on LLAMA3-8B training with a 208K sequence length.

Sequence parallelism (SP), which divides the sequence dimension of input tensors across multiple computational devices, is becoming key to unlocking the long-context capabilities of generative AI models. This paper investigates the state-of-the-art SP approaches, i.e. DeepSpeed-Ulysses and Ring-Attention, and proposes a unified SP approach, which is more robust to transformer model architectures and network hardware topology. This paper compares the communication and memory cost of SP and existing parallelism, including data/tensor/zero/pipeline parallelism, and discusses the best practices for designing hybrid 4D parallelism involving SP. We achieved 47% MFU on two 8xA800 nodes using SP for the LLAMA3-8B model training using sequence length 208K. Our code is publicly available at https://github.com/feifeibear/long-context-attention.

View on arXiv PDF Code

Similar