CVApr 3, 2023

Use Your Head: Improving Long-Tail Video Recognition

arXiv:2304.01143v124 citationsh-index: 44
Originality Incremental advance
AI Analysis

It addresses the problem of biased performance in video recognition for researchers, though it is incremental as it builds on existing long-tail methods.

The paper tackles long-tail video recognition by creating new benchmarks that better reflect few-shot classes and proposing Long-Tail Mixed Reconstruction, which reduces overfitting and achieves state-of-the-art average class accuracy on EPIC-KITCHENS, SSv2-LT, and VideoLT-LT.

This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sampling subsets from two datasets: SSv2 and VideoLT. We then propose a method, Long-Tail Mixed Reconstruction, which reduces overfitting to instances from few-shot classes by reconstructing them as weighted combinations of samples from head classes. LMR then employs label mixing to learn robust decision boundaries. It achieves state-of-the-art average class accuracy on EPIC-KITCHENS and the proposed SSv2-LT and VideoLT-LT. Benchmarks and code at: tobyperrett.github.io/lmr

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes