CVMay 16

iMiGUE-3K: A Large-Scale Benchmark for Micro-Gesture Analysis with Self-Supervised Learning

Chengyan Wang, Haoyu Chen, Hui Wei, Yueyi Yang, Yunquan Chen, Guoying Zhao

arXiv:2605.1717922.2

Predicted impact top 33% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This work provides a large-scale benchmark and foundation models for micro-gesture analysis, enabling progress in affective computing and psychological diagnostics.

The authors introduce iMiGUE-3K, the largest micro-gesture dataset with 3.4K video clips and 37M frames from 332 tennis players, and propose MG-FMs foundation models. Their evaluation shows micro-gesture analysis significantly improves emotion understanding.

Emotion understanding is a fundamental challenge in affective computing and artificial intelligence. While existing approaches predominantly focus on facial expressions and speech, they often overlook the rich emotional cues conveyed through body language. Recently, micro-gestures (MGs), unintentional, subconscious movements driven by inner feelings, have attracted increasing attention as an alternative to other cues. However, there are no existing large-scale datasets supporting the pre-training of the MG foundation model. To advance MG research, we present a new benchmark for micro-gesture-based emotion understanding, featuring key contributions with a novel dataset (iMiGUE-3K) and a series of foundation models for different tasks. Using a model-based crowd-sourcing data collection strategy, we construct iMiGUE-3K, the largest MG dataset to date. It comprises video recordings from 332 distinct professional tennis players' public press interviews over the past seven years, totaling more than 3.4K long video clips and 37 million frames. The dataset includes 32 micro-gesture classes with rich descriptive annotations, making it the first large-scale, in-the-wild, video dataset for fine-grained gesture-based emotion analysis. Built on iMiGUE-3K, we propose MG-FMs, a discriminative foundation model for transferable gesture presentation learning. Based on the foundation model, we establish five comprehensive evaluation tasks: MG recognition (unsupervised, semi-supervised, supervised), MG retrieval, and MG emotion recognition. Our systematic evaluation of representative methods demonstrates that micro-gesture-based analysis significantly improves emotion understanding. We hope this work can provide comprehensive tools for MG analysis and set a solid foundation for future research in psychological diagnostics, affective computing, and advanced human-computer interaction.

View on arXiv PDF

Similar