LG DCJul 7, 2022

Training Transformers Together

Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf

Hugging Face

arXiv:2207.03481v17.810 citationsh-index: 28Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the issue of high training costs limiting access to large corporations, making it more accessible to independent parties, though it is incremental as it builds on existing collaborative methods.

The paper tackled the problem of expensive infrastructure for training state-of-the-art models by demonstrating collaborative training of a text-to-image transformer similar to DALL-E, resulting in a model that generates images of reasonable quality on various prompts.

The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.

View on arXiv PDF Code

Similar