A Novel Bounding Box Regression Method for Single Object Tracking
This work addresses a specific bottleneck in object tracking for computer vision applications, offering incremental improvements.
The paper tackles the problem of bounding box regression in single object tracking by introducing inception and deformable networks to improve receptive fields, resulting in outperforming ODTrack on GOT-10k, UAV123, and OTB2015 benchmarks.
Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015.