Multi-shot Pedestrian Re-identification via Sequential Decision Making
This addresses efficiency and interpretability in surveillance video analysis, though it is incremental as it builds on existing reinforcement learning methods for a specific domain.
The paper tackles the multi-shot pedestrian re-identification problem by proposing a reinforcement learning approach that allows an agent to decide when to output a result or request more images, achieving competitive state-of-the-art performance on three benchmarks while using only 3% to 6% of images.
Multi-shot pedestrian re-identification problem is at the core of surveillance video analysis. It matches two tracks of pedestrians from different cameras. In contrary to existing works that aggregate single frames features by time series model such as recurrent neural network, in this paper, we propose an interpretable reinforcement learning based approach to this problem. Particularly, we train an agent to verify a pair of images at each time. The agent could choose to output the result (same or different) or request another pair of images to verify (unsure). By this way, our model implicitly learns the difficulty of image pairs, and postpone the decision when the model does not accumulate enough evidence. Moreover, by adjusting the reward for unsure action, we can easily trade off between speed and accuracy. In three open benchmarks, our method are competitive with the state-of-the-art methods while only using 3% to 6% images. These promising results demonstrate that our method is favorable in both efficiency and performance.