ROAM: Recurrently Optimizing Tracking Model
This work addresses the challenge of efficient and accurate object tracking in computer vision, offering incremental improvements in speed and performance for applications like video analysis.
The paper tackles the problem of object tracking by proposing a model with resizable convolutional filters to avoid enumerating anchors, saving parameters, and using a recurrent neural optimizer for fast adaptation to appearance changes. The result is improved convergence speed and performance, with favorable results against state-of-the-art algorithms on benchmarks like OTB, VOT, LaSOT, GOT-10K, and TrackingNet.
In this paper, we design a tracking model consisting of response generation and bounding box regression, where the first component produces a heat map to indicate the presence of the object at different positions and the second part regresses the relative bounding box shifts to anchors mounted on sliding-window locations. Thanks to the resizable convolutional filters used in both components to adapt to the shape changes of objects, our tracking model does not need to enumerate different sized anchors, thus saving model parameters. To effectively adapt the model to appearance variations, we propose to offline train a recurrent neural optimizer to update tracking model in a meta-learning setting, which can converge the model in a few gradient steps. This improves the convergence speed of updating the tracking model while achieving better performance. We extensively evaluate our trackers, ROAM and ROAM++, on the OTB, VOT, LaSOT, GOT-10K and TrackingNet benchmark and our methods perform favorably against state-of-the-art algorithms.