SD CV ASApr 14, 2025

Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis

Zihao Liu, Mingwen Ou, Zunnan Xu, Jiaqi Huang, Haonan Han, Ronghui Li, Xiu Li

Tsinghua

arXiv:2504.09885v212.95 citationsh-index: 13MM

Originality Incremental advance

AI Analysis

This work addresses the challenge of generating realistic piano performances for applications in animation and music education, representing a domain-specific incremental improvement.

The paper tackles the problem of synthesizing coordinated bimanual piano hand motions from audio by proposing a dual-stream diffusion model that independently models each hand's motion while enhancing coordination, achieving state-of-the-art performance across multiple metrics.

Automating the synthesis of coordinated bimanual piano performances poses significant challenges, particularly in capturing the intricate choreography between the hands while preserving their distinct kinematic signatures. In this paper, we propose a dual-stream neural framework designed to generate synchronized hand gestures for piano playing from audio input, addressing the critical challenge of modeling both hand independence and coordination. Our framework introduces two key innovations: (i) a decoupled diffusion-based generation framework that independently models each hand's motion via dual-noise initialization, sampling distinct latent noise for each while leveraging a shared positional condition, and (ii) a Hand-Coordinated Asymmetric Attention (HCAA) mechanism suppresses symmetric (common-mode) noise to highlight asymmetric hand-specific features, while adaptively enhancing inter-hand coordination during denoising. Comprehensive evaluations demonstrate that our framework outperforms existing state-of-the-art methods across multiple metrics. Our project is available at https://monkek123king.github.io/S2C_page/.

View on arXiv PDF

Similar