A Flexible Selection Scheme for Minimum-Effort Transfer Learning
This addresses a practical issue in real-world data adaptation for visual recognition, though it is incremental as it builds on existing fine-tuning techniques.
The paper tackles the problem of transferring knowledge from a pre-trained network to visually different but semantically similar data, introducing flex-tuning as a generalized fine-tuning method that automatically selects the best network unit to tune, showing it often outperforms tuning the last layer in domain-shift scenarios.
Fine-tuning is a popular way of exploiting knowledge contained in a pre-trained convolutional network for a new visual recognition task. However, the orthogonal setting of transferring knowledge from a pretrained network to a visually different yet semantically close source is rarely considered: This commonly happens with real-life data, which is not necessarily as clean as the training source (noise, geometric transformations, different modalities, etc.). To tackle such scenarios, we introduce a new, generalized form of fine-tuning, called flex-tuning, in which any individual unit (e.g. layer) of a network can be tuned, and the most promising one is chosen automatically. In order to make the method appealing for practical use, we propose two lightweight and faster selection procedures that prove to be good approximations in practice. We study these selection criteria empirically across a variety of domain shifts and data scarcity scenarios, and show that fine-tuning individual units, despite its simplicity, yields very good results as an adaptation technique. As it turns out, in contrast to common practice, rather than the last fully-connected unit it is best to tune an intermediate or early one in many domain-shift scenarios, which is accurately detected by flex-tuning.