Appearance-Preserving 3D Convolution for Video-based Person Re-identification
This work addresses a specific bottleneck in video-based person re-identification for surveillance and security applications, offering an incremental improvement over existing 3D ConvNets.
The paper tackles the problem of temporal appearance misalignment in video-based person re-identification by proposing Appearance-Preserving 3D Convolution (AP3D), which preserves appearance representation while modeling temporal information, achieving state-of-the-art results on three widely used datasets.
Due to the imperfect person detection results and posture changes, temporal appearance misalignment is unavoidable in video-based person re-identification (ReID). In this case, 3D convolution may destroy the appearance representation of person video clips, thus it is harmful to ReID. To address this problem, we propose AppearancePreserving 3D Convolution (AP3D), which is composed of two components: an Appearance-Preserving Module (APM) and a 3D convolution kernel. With APM aligning the adjacent feature maps in pixel level, the following 3D convolution can model temporal information on the premise of maintaining the appearance representation quality. It is easy to combine AP3D with existing 3D ConvNets by simply replacing the original 3D convolution kernels with AP3Ds. Extensive experiments demonstrate the effectiveness of AP3D for video-based ReID and the results on three widely used datasets surpass the state-of-the-arts. Code is available at: https://github.com/guxinqian/AP3D.