CVApr 15

PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination

Xuan Wang, Kai Ruan, Jiayi Han, Kaiyue Zhou, Gaoang Wang

arXiv:2604.1285666.8h-index: 6

AI Analysis

This work addresses the need for realistic, real-time piano motion generation for virtual characters and music visualization, offering a significant improvement in inference speed and coordination quality.

PianoFlow introduces a flow-matching framework for audio-driven bimanual piano motion generation that uses MIDI as a privileged modality during training for better musical understanding, an asymmetric role-gated interaction module for dynamic cross-hand coordination, and an autoregressive flow continuation scheme for real-time streaming of long sequences. It achieves superior performance on PianoMotion10M and accelerates inference by over 9× compared to prior methods.

Audio-driven bimanual piano motion generation requires precise modeling of complex musical structures and dynamic cross-hand coordination. However, existing methods often rely on acoustic-only representations lacking symbolic priors, employ inflexible interaction mechanisms, and are limited to computationally expensive short-sequence generation. To address these limitations, we propose PianoFlow, a flow-matching framework for precise and coordinated bimanual piano motion synthesis. Our approach strategically leverages MIDI as a privileged modality during training, distilling these structured musical priors to achieve deep semantic understanding while maintaining audio-only inference. Furthermore, we introduce an asymmetric role-gated interaction module to explicitly capture dynamic cross-hand coordination through role-aware attention and temporal gating. To enable real-time streaming generation for arbitrarily long sequences, we design an autoregressive flow continuation scheme that ensures seamless cross-chunk temporal coherence. Extensive experiments on the PianoMotion10M dataset demonstrate that PianoFlow achieves superior quantitative and qualitative performance, while accelerating inference by over 9\times compared to previous methods.

View on arXiv PDF

Similar