LGCLNov 23, 2022

TorchScale: Transformers at Scale

Microsoft
arXiv:2211.13184v112 citationsh-index: 102Has Code
Originality Synthesis-oriented
AI Analysis

This provides a practical solution for researchers and developers to scale Transformers more effectively, though it is incremental as it builds on existing scaling libraries.

The authors tackled the challenge of scaling Transformers efficiently by introducing TorchScale, an open-source toolkit that implements modeling techniques to improve generality, capability, training stability, and efficiency, with experimental results showing successful scaling across different sizes in language modeling and neural machine translation.

Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at https://aka.ms/torchscale.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes