LGCVMLOct 25, 2018

K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

arXiv:1810.10703v274 citations
Originality Incremental advance
AI Analysis

This addresses the problem of reducing computational and memory costs for practitioners in machine learning who need to adapt models to multiple tasks, though it is incremental as it builds on existing transfer learning methods.

The paper tackles the problem of parameter-efficient transfer and multi-task learning in deep neural networks by introducing a method that learns small model patches, such as scales and biases, to specialize to each task instead of fine-tuning entire networks. For example, it converts a pretrained SSD model into a 1000-class image classifier while reusing 98% of parameters and matches single-task performance in multi-task learning with fewer parameters than traditional fine-tuning.

We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network to perform well on qualitatively different problems (e.g. converting a Single Shot MultiBox Detection (SSD) model into a 1000-class image classification model while reusing 98% of parameters of the SSD feature extractor). Similarly, we show that re-learning existing low-parameter layers (such as depth-wise convolutions) while keeping the rest of the network frozen also improves transfer-learning accuracy significantly. Our approach allows both simultaneous (multi-task) as well as sequential transfer learning. In several multi-task learning problems, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes