Tipiano: Cascaded Piano Hand Motion Synthesis via Fingertip Priors
For piano animation and human motion synthesis, this work provides a practical framework that balances precision and naturalness, though the anticipatory motion gap identified by experts suggests incremental improvement over existing methods.
The paper tackles realistic piano hand motion synthesis, achieving F1=0.910, substantially outperforming diffusion baselines (F1=0.121), with user study (N=41) confirming quality approaching motion capture.
Synthesizing realistic piano hand motions requires both precision and naturalness. Physics-based methods achieve precision but produce stiff motions; data-driven models learn natural dynamics but struggle with positional accuracy. Piano motion exhibits a natural hierarchy: fingertip positions are nearly deterministic given piano geometry and fingering, while wrist and intermediate joints offer stylistic freedom. We present [OURS], a four-stage framework exploiting this hierarchy: (1) statistics-based fingertip positioning, (2) FiLM-conditioned trajectory refinement, (3) wrist estimation, and (4) STGCN-based pose synthesis. We contribute expert-annotated fingerings for the FürElise dataset (153 pieces, ~10 hours). Experiments demonstrate F1 = 0.910, substantially outperforming diffusion baselines (F1 = 0.121), with user study (N=41) confirming quality approaching motion capture. Expert evaluation by professional pianists (N=5) identified anticipatory motion as the key remaining gap, providing concrete directions for future improvement.