LGDec 9, 2025

Refining Diffusion Models for Motion Synthesis with an Acceleration Loss to Generate Realistic IMU Data

Lars Ole Häusler, Lena Uhlenberg, Göran Köber, Diyora Salimova, Oliver Amft

arXiv:2512.08859v1h-index: 8

Originality Incremental advance

AI Analysis

This work addresses the need for realistic IMU data synthesis for sensor-specific tasks like human activity recognition, but it is incremental as it refines an existing diffusion model with a specialized loss.

The paper tackled the problem of generating realistic IMU data for motion synthesis by fine-tuning a pretrained diffusion model with an acceleration-based loss, resulting in a 12.7% reduction in loss and an 8.7% improvement in human activity recognition classification performance.

We propose a text-to-IMU (inertial measurement unit) motion-synthesis framework to obtain realistic IMU data by fine-tuning a pretrained diffusion model with an acceleration-based second-order loss (L_acc). L_acc enforces consistency in the discrete second-order temporal differences of the generated motion, thereby aligning the diffusion prior with IMU-specific acceleration patterns. We integrate L_acc into the training objective of an existing diffusion model, finetune the model to obtain an IMU-specific motion prior, and evaluate the model with an existing text-to-IMU framework that comprises surface modelling and virtual sensor simulation. We analysed acceleration signal fidelity and differences between synthetic motion representation and actual IMU recordings. As a downstream application, we evaluated Human Activity Recognition (HAR) and compared the classification performance using data of our method with the earlier diffusion model and two additional diffusion model baselines. When we augmented the earlier diffusion model objective with L_acc and continued training, L_acc decreased by 12.7% relative to the original model. The improvements were considerably larger in high-dynamic activities (i.e., running, jumping) compared to low-dynamic activities~(i.e., sitting, standing). In a low-dimensional embedding, the synthetic IMU data produced by our refined model shifts closer to the distribution of real IMU recordings. HAR classification trained exclusively on our refined synthetic IMU data improved performance by 8.7% compared to the earlier diffusion model and by 7.6% over the best-performing comparison diffusion model. We conclude that acceleration-aware diffusion refinement provides an effective approach to align motion generation and IMU synthesis and highlights how flexible deep learning pipelines are for specialising generic text-to-motion priors to sensor-specific tasks.

View on arXiv PDF

Similar