LGDCJul 7, 2022

Training Transformers Together

Hugging Face
arXiv:2207.03481v110 citationsh-index: 28
AI Analysis

This addresses the issue of high training costs limiting access to large corporations, making it more accessible to independent parties, though it is incremental as it builds on existing collaborative methods.

The paper tackled the problem of expensive infrastructure for training state-of-the-art models by demonstrating collaborative training of a text-to-image transformer similar to DALL-E, resulting in a model that generates images of reasonable quality on various prompts.

The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes