Learning Quantised Structure-Preserving Motion Representations for Dance Fingerprinting
This addresses the challenge of scalable motion-based dance retrieval for applications like choreographic analysis, though it appears incremental as it builds on existing motion analysis methods.
The paper tackles the problem of identifying semantically similar choreographies from raw video (dance fingerprinting) by developing DANCEMATCH, an end-to-end framework that constructs compact, discrete motion signatures for efficient large-scale retrieval, demonstrating robust retrieval across diverse dance styles and strong generalization to unseen choreographies.
We present DANCEMATCH, an end-to-end framework for motion-based dance retrieval, the task of identifying semantically similar choreographies directly from raw video, defined as DANCE FINGERPRINTING. While existing motion analysis and retrieval methods can compare pose sequences, they rely on continuous embeddings that are difficult to index, interpret, or scale. In contrast, DANCEMATCH constructs compact, discrete motion signatures that capture the spatio-temporal structure of dance while enabling efficient large-scale retrieval. Our system integrates Skeleton Motion Quantisation (SMQ) with Spatio-Temporal Transformers (STT) to encode human poses, extracted via Apple CoMotion, into a structured motion vocabulary. We further design DANCE RETRIEVAL ENGINE (DRE), which performs sub-linear retrieval using a histogram-based index followed by re-ranking for refined matching. To facilitate reproducible research, we release DANCETYPESBENCHMARK, a pose-aligned dataset annotated with quantised motion tokens. Experiments demonstrate robust retrieval across diverse dance styles and strong generalisation to unseen choreographies, establishing a foundation for scalable motion fingerprinting and quantitative choreographic analysis.