CLMar 20, 2023

PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

arXiv:2303.10845v188 citationsh-index: 33
Originality Incremental advance
AI Analysis

This work addresses the computational challenge of training trillion-parameter models for improved natural language processing, representing an incremental advancement in scaling techniques.

The authors tackled scaling large language models by training a trillion-parameter model named PanGu-Σ using sparse heterogeneous computing, achieving a 6.3x increase in training throughput and state-of-the-art performance in zero-shot Chinese NLP tasks.

The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-Σ. With parameter inherent from PanGu-α, we extend the dense Transformer model to sparse one with Random Routed Experts (RRE), and efficiently train the model over 329B tokens by using Expert Computation and Storage Separation(ECSS). This resulted in a 6.3x increase in training throughput through heterogeneous computing. Our experimental findings show that PanGu-Σ provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks. Moreover, it demonstrates strong abilities when fine-tuned in application data of open-domain dialogue, question answering, machine translation and code generation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes