Metric Learning in Codebook Generation of Bag-of-Words for Person Re-identification
This work addresses the challenge of enhancing discriminative visual descriptors for pedestrian matching in surveillance applications, representing an incremental improvement over conventional methods.
The paper tackles the problem of improving person re-identification by integrating supervised metric learning into the codebook generation phase of the Bag-of-Words model, resulting in state-of-the-art performance on benchmarks such as VIPeR, PRID450S, and Market1501.
Person re-identification is generally divided into two part: first how to represent a pedestrian by discriminative visual descriptors and second how to compare them by suitable distance metrics. Conventional methods isolate these two parts, the first part usually unsupervised and the second part supervised. The Bag-of-Words (BoW) model is a widely used image representing descriptor in part one. Its codebook is simply generated by clustering visual features in Euclidian space. In this paper, we propose to use part two metric learning techniques in the codebook generation phase of BoW. In particular, the proposed codebook is clustered under Mahalanobis distance which is learned supervised. Extensive experiments prove that our proposed method is effective. With several low level features extracted on superpixel and fused together, our method outperforms state-of-the-art on person re-identification benchmarks including VIPeR, PRID450S, and Market1501.