CVApr 30, 2021

Few-Shot Video Object Detection

arXiv:2104.14805v315 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the problem of detecting objects in videos with limited labeled data, which is incremental as it builds on existing few-shot and video detection techniques.

The paper tackles few-shot video object detection by introducing a new dataset (FSVOD-500) and a method combining Tube Proposal Network and Temporal Matching Network, achieving significantly better results than image-based and other video-based methods on two datasets.

We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity. Our TPN and TMN+ are jointly and end-to-end trained. Extensive experiments demonstrate that our method produces significantly better detection results on two few-shot video object detection datasets compared to image-based methods and other naive video-based extensions. Codes and datasets are released at \url{https://github.com/fanq15/FewX}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes