CVJul 8, 2022

Beyond Transfer Learning: Co-finetuning for Action Localisation

arXiv:2207.03807v110 citationsh-index: 151
Originality Incremental advance
AI Analysis

This addresses the challenge of improving action localisation performance, especially for rare classes, by offering a more effective training paradigm than standard transfer learning, though it is incremental in method.

The paper tackles the problem of training deep networks for action localisation by proposing co-finetuning, which simultaneously trains on multiple upstream and downstream tasks, and shows it outperforms traditional transfer learning, achieving state-of-the-art results on AVA and AVA-Kinetics datasets.

Transfer learning is the predominant paradigm for training deep networks on small target datasets. Models are typically pretrained on large ``upstream'' datasets for classification, as such labels are easy to collect, and then finetuned on ``downstream'' tasks such as action localisation, which are smaller due to their finer-grained annotations. In this paper, we question this approach, and propose co-finetuning -- simultaneously training a single model on multiple ``upstream'' and ``downstream'' tasks. We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data, and also show how we can easily extend our approach to multiple ``upstream'' datasets to further improve performance. In particular, co-finetuning significantly improves the performance on rare classes in our downstream task, as it has a regularising effect, and enables the network to learn feature representations that transfer between different datasets. Finally, we observe how co-finetuning with public, video classification datasets, we are able to achieve state-of-the-art results for spatio-temporal action localisation on the challenging AVA and AVA-Kinetics datasets, outperforming recent works which develop intricate models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes