Enhancing Domain Generalization in 3D Human Pose Estimation through Controllable Generative Augmentation
For researchers in 3D human pose estimation, this work addresses domain generalization by providing a data augmentation method that improves model robustness to domain shifts.
This work presents a controllable human pose generation framework that synthesizes diverse video data by varying poses, backgrounds, and camera viewpoints to augment training datasets for 3D human pose estimation, achieving significant performance improvements on unseen scenarios and datasets.
Pedestrian motion, due to its causal nature, is strongly influenced by domain gaps arising from discrepancies between training and testing data distributions. Focusing on 3D human pose estimation, this work presents a controllable human pose generation framework that synthesizes diverse video data by systematically varying poses, backgrounds, and camera viewpoints. This generative augmentation enriches training datasets, enhances model generalization, and alleviates the limitations of existing methods in handling domain discrepancies. By leveraging both indoor/real-world and outdoor/virtual datasets, we perform cross-domain data fusion and controllable video generation to construct enriched training data, tailored to realistic deployment settings. Extensive experiments show that the augmented datasets significantly improve model performance on unseen scenarios and datasets, validating the effectiveness of the proposed approach.