LGFeb 11

Motion Capture is Not the Target Domain: Scaling Synthetic Data for Learning Motion Representations

arXiv:2602.11064v1h-index: 8
Originality Incremental advance
AI Analysis

This addresses the challenge of scalable pretraining for HAR where real data is scarce, but it is incremental in clarifying sim-to-real limits for motion data.

The paper tackles the problem of unreliable transfer of models pretrained on synthetic data to real-world deployment in full-body human motion for wearable-based Human Activity Recognition (HAR), showing that synthetic pretraining improves generalization when mixed with real data or scaled sufficiently, but large-scale motion-capture pretraining yields only marginal gains due to domain mismatch.

Synthetic data offers a compelling path to scalable pretraining when real-world data is scarce, but models pretrained on synthetic data often fail to transfer reliably to deployment settings. We study this problem in full-body human motion, where large-scale data collection is infeasible but essential for wearable-based Human Activity Recognition (HAR), and where synthetic motion can be generated from motion-capture-derived representations. We pretrain motion time-series models using such synthetic data and evaluate their transfer across diverse downstream HAR tasks. Our results show that synthetic pretraining improves generalisation when mixed with real data or scaled sufficiently. We also demonstrate that large-scale motion-capture pretraining yields only marginal gains due to domain mismatch with wearable signals, clarifying key sim-to-real challenges and the limits and opportunities of synthetic motion data for transferable HAR representations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes