CVMar 26, 2023

$Δ$-Patching: A Framework for Rapid Adaptation of Pre-trained Convolutional Networks without Base Performance Loss

Georgia TechNVIDIAU of Toronto
arXiv:2303.14772v21 citationsh-index: 58
Originality Incremental advance
AI Analysis

This addresses the storage efficiency issue for practitioners fine-tuning models on new tasks, though it is incremental as it builds on existing model patching work.

The paper tackles the problem of storing multiple copies of pre-trained models for different fine-tuning tasks by proposing $\Delta$-Patching, a lightweight method that avoids base performance loss. The result shows that $\Delta$-Networks outperform earlier patching work while requiring only a fraction of parameters to be trained.

Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is fine-tuned to. Building on top of recent model patching work, we propose $Δ$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies. We propose a simple and lightweight method called $Δ$-Networks to achieve this objective. Our comprehensive experiments across setting and architecture variants show that $Δ$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained. We also show that this approach can be used for other problem settings such as transfer learning and zero-shot domain adaptation, as well as other tasks such as detection and segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes