CVGRROAug 14, 2023

A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

arXiv:2308.07301v211 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses motion synthesis for applications like animation or robotics by offering a task-independent model, though it appears incremental as it builds on existing Vision Transformer ideas and reformulates tasks as reconstruction problems.

The paper tackles the problem of synthesizing human motion across various tasks, such as forecasting and inbetweening, by proposing a unified model called UNIMASK-M that achieves state-of-the-art results on datasets like Human3.6M and LaFAN1, particularly excelling in long transition periods.

The synthesis of human motion has traditionally been addressed through task-dependent models that focus on specific challenges, such as predicting future motions or filling in intermediate poses conditioned on known key-poses. In this paper, we present a novel task-independent model called UNIMASK-M, which can effectively address these challenges using a unified architecture. Our model obtains comparable or better performance than the state-of-the-art in each field. Inspired by Vision Transformers (ViTs), our UNIMASK-M model decomposes a human pose into body parts to leverage the spatio-temporal relationships existing in human motion. Moreover, we reformulate various pose-conditioned motion synthesis tasks as a reconstruction problem with different masking patterns given as input. By explicitly informing our model about the masked joints, our UNIMASK-M becomes more robust to occlusions. Experimental results show that our model successfully forecasts human motion on the Human3.6M dataset. Moreover, it achieves state-of-the-art results in motion inbetweening on the LaFAN1 dataset, particularly in long transition periods. More information can be found on the project website https://evm7.github.io/UNIMASKM-page/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes