Learning Spatial-Aware Regressions for Visual Tracking
This work addresses visual tracking for computer vision applications, presenting an incremental improvement by integrating spatial awareness into regression models.
The paper tackles the problem of robust visual tracking by analyzing spatial information in deep features and proposing two complementary regression models: a kernelized ridge regression formulated as a neural network and a fully convolutional network with spatially regularized kernels, combined to improve tracking performance, with experimental validation on two benchmark datasets.
In this paper, we analyze the spatial information of deep features, and propose two complementary regressions for robust visual tracking. First, we propose a kernelized ridge regression model wherein the kernel value is defined as the weighted sum of similarity scores of all pairs of patches between two samples. We show that this model can be formulated as a neural network and thus can be efficiently solved. Second, we propose a fully convolutional neural network with spatially regularized kernels, through which the filter kernel corresponding to each output channel is forced to focus on a specific region of the target. Distance transform pooling is further exploited to determine the effectiveness of each output channel of the convolution layer. The outputs from the kernelized ridge regression model and the fully convolutional neural network are combined to obtain the ultimate response. Experimental results on two benchmark datasets validate the effectiveness of the proposed method.