V. Sharma

1.4CVJul 1, 2022

Literature on Hand GESTURE Recognition using Graph based methods

Neha Baranwal, Varun Sharma

Skeleton based recognition systems are gaining popularity and machine learning models focusing on points or joints in a skeleton have proved to be computationally effective and application in many areas like Robotics. It is easy to track points and thereby preserving spatial and temporal information, which plays an important role in abstracting the required information, classification becomes an easy task. In this paper, we aim to study these points but using a cloud mechanism, where we define a cloud as collection of points. However, when we add temporal information, it may not be possible to retrieve the coordinates of a point in each frame and hence instead of focusing on a single point, we can use k-neighbors to retrieve the state of the point under discussion. Our focus is to gather such information using weight sharing but making sure that when we try to retrieve the information from neighbors, we do not carry noise with it. LSTM which has capability of long-term modelling and can carry both temporal and spatial information. In this article we tried to summarise graph based gesture recognition method.

CVJun 26

STAG: Spatio-temporal Evolving Structural Representation of Action Units for Micro-expression Recognition

Nandani Sharma, Varun Sharma, Dinesh Singh

Micro-expression recognition is challenging due to subtle and short-lived facial muscle movements. Existing methods rely heavily on apex-onset frames, overlook fine-grained inter-frame dynamics, and separately model spatial and temporal information, limiting generalization across datasets. To address these challenges, we propose STAG, a dynamic ROI-AU-coupled spatial-temporal network that jointly models motion flow and adaptive facial connectivity. The framework extracts optical flow from discriminative frames using magnitude-based selection and temporal attention. A dual-branch architecture combines an enhanced graph attention network for structured spatial reasoning with a transformer encoder for temporal modeling. A bidirectional cross-attention module enables mutual refinement of spatial and temporal features, while AU-guided dynamic connectivity adapts facial region interactions according to muscle activation patterns. The transformer captures subtle temporal dynamics beyond apex-based approaches, improving semantic consistency and interpretability for explainable micro-expression recognition. The fused representation is optimized using focal loss and evaluated on CASME II, 4DME, DFME, NaME, SAMM, and SMIC-HS. Extensive experiments demonstrate improved robustness, generalization, interpretability, and computational efficiency, confirming the effectiveness of adaptive relational reasoning, AU-guided dynamic connectivity, and deep spatial-temporal feature fusion for accurate cross-dataset micro-expression recognition.

V. Sharma

2 Papers