LGOct 19, 2022

lo-fi: distributed fine-tuning without communication

AI2
arXiv:2210.11948v224 citationsh-index: 77
Originality Incremental advance
AI Analysis

This reduces resource barriers and enables fine-tuning in communication-prohibitive settings, though it is incremental as it builds on existing fine-tuning and averaging techniques.

The paper tackles the problem of fine-tuning large neural networks by proposing lo-fi, a method that fine-tunes each node independently without communication and averages weights afterward, matching baseline accuracy on ImageNet and OPT models while improving performance under distribution shift.

When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes