CVHCDec 23, 2020

Efficient video annotation with visual interpolation and frame selection guidance

arXiv:2012.12554v121 citations
AI Analysis

This work provides a significant improvement in efficiency for human annotators performing video annotation with bounding boxes, reducing the tedious and time-consuming nature of the task.

This paper introduces a unified framework for video annotation with bounding boxes, addressing the challenges of automatic temporal interpolation/extrapolation and automatic frame selection. The proposed method reduces the number of manual bounding boxes by 60% over linear interpolation and 35% over an off-the-shelf tracker, and improves annotation time by 50% compared to linear interpolation in human experiments.

We introduce a unified framework for generic video annotation with bounding boxes. Video annotation is a longstanding problem, as it is a tedious and time-consuming process. We tackle two important challenges of video annotation: (1) automatic temporal interpolation and extrapolation of bounding boxes provided by a human annotator on a subset of all frames, and (2) automatic selection of frames to annotate manually. Our contribution is two-fold: first, we propose a model that has both interpolating and extrapolating capabilities; second, we propose a guiding mechanism that sequentially generates suggestions for what frame to annotate next, based on the annotations made previously. We extensively evaluate our approach on several challenging datasets in simulation and demonstrate a reduction in terms of the number of manual bounding boxes drawn by 60% over linear interpolation and by 35% over an off-the-shelf tracker. Moreover, we also show 10% annotation time improvement over a state-of-the-art method for video annotation with bounding boxes [25]. Finally, we run human annotation experiments and provide extensive analysis of the results, showing that our approach reduces actual measured annotation time by 50% compared to commonly used linear interpolation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes