CVJul 5, 2023

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

arXiv:2307.02508v21 citationsh-index: 37
AI Analysis

This work addresses video object tracking for kitchen scenarios, but it is incremental as it builds on existing AOT and SAM methods.

The study tackled single object tracking in videos by converting bounding boxes to masks using SAM and Alpha-Refine, and propagating them with MSDeAOT, achieving first place in the EPIC-KITCHENS TREK-150 challenge.

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object tracking and segmentation. In this study, we convert the bounding boxes to masks in reference frames with the help of the Segment Anything Model (SAM) and Alpha-Refine, and then propagate the masks to the current frame, transforming the task from Video Object Tracking (VOT) to video object segmentation (VOS). Furthermore, we introduce MSDeAOT, a variant of the AOT series that incorporates transformers at multiple feature scales. MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8. As a testament to the effectiveness of our design, we achieved the 1st place in the EPIC-KITCHENS TREK-150 Object Tracking Challenge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes