Differentiable Architecture Pruning for Transfer Learning
This addresses the bottleneck of architecture transfer in few-shot learning scenarios, though it appears incremental as an extension of pruning methods.
The paper tackles the problem of extracting transferable sub-architectures from large models for transfer learning with limited data, proposing a gradient-based pruning method that disentangles architecture from weights and achieves successful retraining on new tasks.
We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.