SOTVerse: A User-defined Task Space of Single Object Tracking
This addresses a bottleneck in SOT research for computer vision researchers by systematizing benchmarks and enabling user-defined tasks, though it is incremental in its approach.
The paper tackles the inefficiency in single object tracking (SOT) research by introducing SOTVerse, a user-defined task space with 12.56 million frames, which improves data utilization and evaluation methods, leading to more standardized and scientific research.
Single object tracking (SOT) research falls into a cycle -- trackers perform well on most benchmarks but quickly fail in challenging scenarios, causing researchers to doubt the insufficient data content and take more effort to construct larger datasets with more challenging situations. However, inefficient data utilization and limited evaluation methods more seriously hinder SOT research. The former causes existing datasets can not be exploited comprehensively, while the latter neglects challenging factors in the evaluation process. In this article, we systematize the representative benchmarks and form a Single Object Tracking metaverse (SOTVerse) -- a user-defined SOT task space to break through the bottleneck. We first propose a 3E Paradigm to describe tasks by three components (i.e., environment, evaluation, and executor). Then, we summarize task characteristics, clarify the organization standards, and construct SOTVerse with 12.56 million frames. Specifically, SOTVerse automatically labels challenging factors per frame, allowing users to generate user-defined spaces efficiently via construction rules. Besides, SOTVerse provides two mechanisms with new indicators and successfully evaluates trackers under various subtasks. Consequently, SOTVerse first provides a strategy to improve resource utilization in the computer vision area, making research more standardized and scientific. The SOTVerse, toolkit, evaluation server, and results are available at http://metaverse.aitestunion.com.