Reference-Aided Part-Aligned Feature Disentangling for Video Person Re-Identification
It addresses alignment issues in video-based person re-identification, which is important for surveillance applications, but appears incremental as it builds on existing methods for feature disentangling.
The paper tackles pedestrian misalignment in video person re-identification by proposing a Reference-Aided Part-Aligned (RAPA) framework to disentangle robust part features, achieving improved performance on benchmarks like iLIDS-VID, PRID-2011, and MARS datasets.
Recently, video-based person re-identification (re-ID) has drawn increasing attention in compute vision community because of its practical application prospects. Due to the inaccurate person detections and pose changes, pedestrian misalignment significantly increases the difficulty of feature extraction and matching. To address this problem, in this paper, we propose a \textbf{R}eference-\textbf{A}ided \textbf{P}art-\textbf{A}ligned (\textbf{RAPA}) framework to disentangle robust features of different parts. Firstly, in order to obtain better references between different videos, a pose-based reference feature learning module is introduced. Secondly, an effective relation-based part feature disentangling module is explored to align frames within each video. By means of using both modules, the informative parts of pedestrian in videos are well aligned and more discriminative feature representation is generated. Comprehensive experiments on three widely-used benchmarks, i.e. iLIDS-VID, PRID-2011 and MARS datasets verify the effectiveness of the proposed framework. Our code will be made publicly available.