Exploring Deep Models for Practical Gait Recognition
This work addresses the problem of practical person identification from a distance for security and surveillance applications, representing an incremental advancement by applying deep models to an existing domain.
The paper tackled the problem of gait recognition in real-world outdoor settings by challenging the stereotype of shallow models and demonstrating the superiority of deep architectures with explicit temporal modeling, resulting in significant performance improvements on datasets like Gait3D and GREW, with the DeepGaitV2 series achieving new state-of-the-art results in most cases.
Gait recognition is a rapidly advancing vision technique for person identification from a distance. Prior studies predominantly employed relatively shallow networks to extract subtle gait features, achieving impressive successes in constrained settings. Nevertheless, experiments revealed that existing methods mostly produce unsatisfactory results when applied to newly released real-world gait datasets. This paper presents a unified perspective to explore how to construct deep models for state-of-the-art outdoor gait recognition, including the classical CNN-based and emerging Transformer-based architectures. Specifically, we challenge the stereotype of shallow gait models and demonstrate the superiority of explicit temporal modeling and deep transformer structure for discriminative gait representation learning. Consequently, the proposed CNN-based DeepGaitV2 series and Transformer-based SwinGait series exhibit significant performance improvements on Gait3D and GREW. As for the constrained gait datasets, the DeepGaitV2 series also reaches a new state-of-the-art in most cases, convincingly showing its practicality and generality. The source code is available at https://github.com/ShiqiYu/OpenGait.