CVMar 18, 2024

AICL: Action In-Context Learning for Video Diffusion Model

arXiv:2403.11535v22 citationsh-index: 47
AI Analysis

This addresses the limitation of video generation models in handling diverse actions, particularly for less common ones, by improving action understanding in open-domain scenarios.

The paper tackles the problem of generating less common actions in open-domain video generation by enabling models to understand action information from reference videos through in-context learning, achieving state-of-the-art performance across three video diffusion models on five metrics with randomly selected categories from non-training datasets.

The open-domain video generation models are constrained by the scale of the training video datasets, and some less common actions still cannot be generated. Some researchers explore video editing methods and achieve action generation by editing the spatial information of the same action video. However, this method mechanically generates identical actions without understanding, which does not align with the characteristics of open-domain scenarios. In this paper, we propose AICL, which empowers the generative model with the ability to understand action information in reference videos, similar to how humans do, through in-context learning. Extensive experiments demonstrate that AICL effectively captures the action and achieves state-of-the-art generation performance across three typical video diffusion models on five metrics when using randomly selected categories from non-training datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes