LGAICLGRJul 15, 2024

SuperPADL: Scaling Language-Directed Physics-Based Control with Progressive Supervised Distillation

arXiv:2407.10481v122 citationsh-index: 19
Originality Incremental advance
AI Analysis

This work addresses a scalability problem in physics-based animation for creating responsive character animations using natural language, enabling more diverse and interactive animations for users, though it is incremental by building on existing RL and supervised learning methods.

The paper tackles the challenge of scaling physics-based text-to-motion control beyond hundreds of motions by introducing SuperPADL, a framework that combines reinforcement learning and supervised learning with progressive distillation, achieving real-time performance on over 5000 skills and outperforming RL-based baselines.

Physically-simulated models for human motion can generate high-quality responsive character animations, often in real-time. Natural language serves as a flexible interface for controlling these models, allowing expert and non-expert users to quickly create and edit their animations. Many recent physics-based animation methods, including those that use text interfaces, train control policies using reinforcement learning (RL). However, scaling these methods beyond several hundred motions has remained challenging. Meanwhile, kinematic animation models are able to successfully learn from thousands of diverse motions by leveraging supervised learning methods. Inspired by these successes, in this work we introduce SuperPADL, a scalable framework for physics-based text-to-motion that leverages both RL and supervised learning to train controllers on thousands of diverse motion clips. SuperPADL is trained in stages using progressive distillation, starting with a large number of specialized experts using RL. These experts are then iteratively distilled into larger, more robust policies using a combination of reinforcement learning and supervised learning. Our final SuperPADL controller is trained on a dataset containing over 5000 skills and runs in real time on a consumer GPU. Moreover, our policy can naturally transition between skills, allowing for users to interactively craft multi-stage animations. We experimentally demonstrate that SuperPADL significantly outperforms RL-based baselines at this large data scale.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes