Similarity of Pre-trained and Fine-tuned Representations
This work addresses the problem of optimizing representation changes in transfer learning for machine learning practitioners, but it appears incremental as it extends insights from few-shot learning to a broader context.
The paper investigates whether representation changes in early layers, beneficial in few-shot learning, also apply to transfer learning, and finds that pre-trained structures are unlearned if not usable.
In transfer learning, only the last part of the networks - the so-called head - is often fine-tuned. Representation similarity analysis shows that the most significant change still occurs in the head even if all weights are updatable. However, recent results from few-shot learning have shown that representation change in the early layers, which are mostly convolutional, is beneficial, especially in the case of cross-domain adaption. In our paper, we find out whether that also holds true for transfer learning. In addition, we analyze the change of representation in transfer learning, both during pre-training and fine-tuning, and find out that pre-trained structure is unlearned if not usable.