CVAINENov 20, 2022

Revisiting Color-Event based Tracking: A Unified Network, Dataset, and Metric

arXiv:2211.11010v279 citationsh-index: 64Has Code
Originality Incremental advance
AI Analysis

This work provides a more efficient and unified solution for researchers and practitioners in computer vision working on robust object tracking using combined color and event cameras, though it is incremental in integrating existing modalities into a streamlined framework.

The paper tackles the problem of inefficient and complex multi-module frameworks for color-event object tracking by proposing a single-stage Transformer-based network (CEUTrack) that achieves over 75 FPS and new state-of-the-art performance. It also introduces a large-scale dataset (COESOT) with 90 categories and 1354 sequences and a new evaluation metric (BOC) to address data deficiency and improve benchmarking.

Combining the Color and Event cameras (also called Dynamic Vision Sensors, DVS) for robust object tracking is a newly emerging research topic in recent years. Existing color-event tracking framework usually contains multiple scattered modules which may lead to low efficiency and high computational complexity, including feature extraction, fusion, matching, interactive learning, etc. In this paper, we propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously. Given the event points and RGB frames, we first transform the points into voxels and crop the template and search regions for both modalities, respectively. Then, these regions are projected into tokens and parallelly fed into the unified Transformer backbone network. The output features will be fed into a tracking head for target object localization. Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance. To better validate the effectiveness of our model and address the data deficiency of this task, we also propose a generic and large-scale benchmark dataset for color-event tracking, termed COESOT, which contains 90 categories and 1354 video sequences. Additionally, a new evaluation metric named BOC is proposed in our evaluation toolkit to evaluate the prominence with respect to the baseline methods. We hope the newly proposed method, dataset, and evaluation metric provide a better platform for color-event-based tracking. The dataset, toolkit, and source code will be released on: \url{https://github.com/Event-AHU/COESOT}.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes