LGOct 18, 2022

Few-Shot Learning of Compact Models via Task-Specific Meta Distillation

arXiv:2210.09922v111 citationsh-index: 101
Originality Incremental advance
AI Analysis

This addresses the need for efficient, small models in deployment scenarios where computational resources are limited, though it is incremental as it builds on existing meta-learning frameworks like MAML.

The paper tackles the problem of few-shot learning for compact models by proposing task-specific meta distillation, which jointly trains a large teacher and a small student model during meta-training, then uses the adapted teacher to guide the student's adaptation for deployment, achieving improved performance on benchmark datasets.

We consider a new problem of few-shot learning of compact models. Meta-learning is a popular approach for few-shot learning. Previous work in meta-learning typically assumes that the model architecture during meta-training is the same as the model architecture used for final deployment. In this paper, we challenge this basic assumption. For final deployment, we often need the model to be small. But small models usually do not have enough capacity to effectively adapt to new tasks. In the mean time, we often have access to the large dataset and extensive computing power during meta-training since meta-training is typically performed on a server. In this paper, we propose task-specific meta distillation that simultaneously learns two models in meta-learning: a large teacher model and a small student model. These two models are jointly learned during meta-training. Given a new task during meta-testing, the teacher model is first adapted to this task, then the adapted teacher model is used to guide the adaptation of the student model. The adapted student model is used for final deployment. We demonstrate the effectiveness of our approach in few-shot image classification using model-agnostic meta-learning (MAML). Our proposed method outperforms other alternatives on several benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes