Multi-scale Context-aware Network with Transformer for Gait Recognition
This work addresses gait recognition for biometric identification, offering incremental improvements through novel modules for temporal and spatial feature handling.
The paper tackles gait recognition by proposing a multi-scale context-aware network with transformer (MCAT) that adaptively aggregates temporal features across scales and selects discriminative spatial features, achieving state-of-the-art performance with rank-1 accuracies up to 98.7% on CASIA-B, 97.5% on OU-MVLP, and 50.6% on GREW.
Although gait recognition has drawn increasing research attention recently, since the silhouette differences are quite subtle in spatial domain, temporal feature representation is crucial for gait recognition. Inspired by the observation that humans can distinguish gaits of different subjects by adaptively focusing on clips of varying time scales, we propose a multi-scale context-aware network with transformer (MCAT) for gait recognition. MCAT generates temporal features across three scales, and adaptively aggregates them using contextual information from both local and global perspectives. Specifically, MCAT contains an adaptive temporal aggregation (ATA) module that performs local relation modeling followed by global relation modeling to fuse the multi-scale features. Besides, in order to remedy the spatial feature corruption resulting from temporal operations, MCAT incorporates a salient spatial feature learning (SSFL) module to select groups of discriminative spatial features. Extensive experiments conducted on three datasets demonstrate the state-of-the-art performance. Concretely, we achieve rank-1 accuracies of 98.7%, 96.2% and 88.7% under normal walking, bag-carrying and coat-wearing conditions on CASIA-B, 97.5% on OU-MVLP and 50.6% on GREW. The source code will be available at https://github.com/zhuduowang/MCAT.git.