Virtual avatar generation models as world navigators
This work addresses the need for virtual avatars in complex tasks like robotics, sports, and healthcare, but it appears incremental as it builds on existing diffusion models with a novel twist.
The paper tackles the problem of simulating human movement in rock climbing environments by introducing SABR-CLIMB, a diffusion transformer model that predicts samples instead of noise and processes entire videos to output motion sequences, using a large proprietary dataset and computational resources to demonstrate a proof of concept for general-purpose virtual avatars.
We introduce SABR-CLIMB, a novel video model simulating human movement in rock climbing environments using a virtual avatar. Our diffusion transformer predicts the sample instead of noise in each diffusion step and ingests entire videos to output complete motion sequences. By leveraging a large proprietary dataset, NAV-22M, and substantial computational resources, we showcase a proof of concept for a system to train general-purpose virtual avatars for complex tasks in robotics, sports, and healthcare.