GRAIJun 3, 2025

Gen4D: Synthesizing Humans and Scenes in the Wild

arXiv:2506.05397v11 citationsh-index: 11
Originality Highly original
AI Analysis

This addresses data scarcity for computer vision tasks in sports and other human-centric domains, offering a scalable synthetic alternative to real-world collection.

The paper tackles the lack of diverse synthetic data for in-the-wild human activities by introducing Gen4D, an automated pipeline that generates photorealistic 4D human animations, resulting in the SportPAL dataset with over 10,000 sequences across three sports.

Lack of input data for in-the-wild activities often results in low performance across various computer vision tasks. This challenge is particularly pronounced in uncommon human-centric domains like sports, where real-world data collection is complex and impractical. While synthetic datasets offer a promising alternative, existing approaches typically suffer from limited diversity in human appearance, motion, and scene composition due to their reliance on rigid asset libraries and hand-crafted rendering pipelines. To address this, we introduce Gen4D, a fully automated pipeline for generating diverse and photorealistic 4D human animations. Gen4D integrates expert-driven motion encoding, prompt-guided avatar generation using diffusion-based Gaussian splatting, and human-aware background synthesis to produce highly varied and lifelike human sequences. Based on Gen4D, we present SportPAL, a large-scale synthetic dataset spanning three sports: baseball, icehockey, and soccer. Together, Gen4D and SportPAL provide a scalable foundation for constructing synthetic datasets tailored to in-the-wild human-centric vision tasks, with no need for manual 3D modeling or scene design.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes