CVAIMar 15

Walking Further: Semantic-aware Multimodal Gait Recognition Under Long-Range Conditions

arXiv:2603.1418931.5h-index: 3
AI Analysis

This addresses a key limitation in biometric identification for real-world applications, though it is incremental as it builds on existing multimodal and semantic techniques.

The paper tackles the problem of gait recognition failing in long-range and cross-distance scenarios by introducing LRGait, a multimodal benchmark, and EMGaitNet, a framework that uses semantic-guided fusion to align 2D and 3D features, achieving robust performance across diverse outdoor conditions.

Gait recognition is an emerging biometric technology that enables non-intrusive and hard-to-spoof human identification. However, most existing methods are confined to short-range, unimodal settings and fail to generalize to long-range and cross-distance scenarios under real-world conditions. To address this gap, we present \textbf{LRGait}, the first LiDAR-Camera multimodal benchmark designed for robust long-range gait recognition across diverse outdoor distances and environments. We further propose \textbf{EMGaitNet}, an end-to-end framework tailored for long-range multimodal gait recognition. To bridge the modality gap between RGB images and point clouds, we introduce a semantic-guided fusion pipeline. A CLIP-based Semantic Mining (SeMi) module first extracts human body-part-aware semantic cues, which are then employed to align 2D and 3D features via a Semantic-Guided Alignment (SGA) module within a unified embedding space. A Symmetric Cross-Attention Fusion (SCAF) module hierarchically integrates visual contours and 3D geometric features, and a Spatio-Temporal (ST) module captures global gait dynamics. Extensive experiments on various gait datasets validate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes