CVOct 21, 2021

LARNet: Latent Action Representation for Human Action Synthesis

arXiv:2110.10899v2Has Code
Originality Incremental advance
AI Analysis

This work addresses video synthesis for human actions, offering a generative alternative to methods that rely on driving videos, though it appears incremental as it builds on prior decomposition approaches.

The authors tackled the problem of generating human action videos without needing a driving video for dynamics, by proposing LARNet, which learns action dynamics in latent space and integrates it with appearance using a recurrent hierarchical structure, achieving effective results on four real-world datasets.

We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach instead, which explicitly learns action dynamics in latent space avoiding the need of a driving video during inference. The generated action dynamics is integrated with the appearance using a recurrent hierarchical structure which induces motion at different scales to focus on both coarse as well as fine level action details. In addition, we propose a novel mix-adversarial loss function which aims at improving the temporal coherency of synthesized videos. We evaluate the proposed approach on four real-world human action datasets demonstrating the effectiveness of the proposed approach in generating human actions. Code available at https://github.com/aayushjr/larnet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes