CVDec 18, 2025

ARMFlow: AutoRegressive MeanFlow for Online 3D Human Reaction Generation

arXiv:2512.16234v1h-index: 13
Originality Incremental advance
AI Analysis

This addresses the challenge of real-time, high-fidelity human reaction generation for applications like virtual reality or gaming, though it appears incremental as it builds on existing MeanFlow methods.

The paper tackles the problem of generating 3D human reactions in real-time online scenarios, proposing ARMFlow to achieve high motion fidelity and low latency, with results showing over 40% improvement in FID compared to existing online methods.

3D human reaction generation faces three main challenges:(1) high motion fidelity, (2) real-time inference, and (3) autoregressive adaptability for online scenarios. Existing methods fail to meet all three simultaneously. We propose ARMFlow, a MeanFlow-based autoregressive framework that models temporal dependencies between actor and reactor motions. It consists of a causal context encoder and an MLP-based velocity predictor. We introduce Bootstrap Contextual Encoding (BSCE) in training, encoding generated history instead of the ground-truth ones, to alleviate error accumulation in autoregressive generation. We further introduce the offline variant ReMFlow, achieving state-of-the-art performance with the fastest inference among offline methods. Our ARMFlow addresses key limitations of online settings by: (1) enhancing semantic alignment via a global contextual encoder; (2) achieving high accuracy and low latency in a single-step inference; and (3) reducing accumulated errors through BSCE. Our single-step online generation surpasses existing online methods on InterHuman and InterX by over 40% in FID, while matching offline state-of-the-art performance despite using only partial sequence conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes