Visual Tracking by means of Deep Reinforcement Learning and an Expert Demonstrator
This work addresses the problem of efficient and accurate visual tracking for video analysis applications, representing an incremental improvement by integrating expert demonstrations into reinforcement learning frameworks.
The paper tackles visual object tracking by proposing two novel trackers, A3CT and A3CTD, which use deep reinforcement learning and expert demonstrations to improve policy learning and behavior correction, achieving state-of-the-art performance on multiple benchmarks while running in real-time.
In the last decade many different algorithms have been proposed to track a generic object in videos. Their execution on recent large-scale video datasets can produce a great amount of various tracking behaviours. New trends in Reinforcement Learning showed that demonstrations of an expert agent can be efficiently used to speed-up the process of policy learning. Taking inspiration from such works and from the recent applications of Reinforcement Learning to visual tracking, we propose two novel trackers, A3CT, which exploits demonstrations of a state-of-the-art tracker to learn an effective tracking policy, and A3CTD, that takes advantage of the same expert tracker to correct its behaviour during tracking. Through an extensive experimental validation on the GOT-10k, OTB-100, LaSOT, UAV123 and VOT benchmarks, we show that the proposed trackers achieve state-of-the-art performance while running in real-time.