arxiv:2410.05791

FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance

Published on Oct 8

· Submitted by

akhaliq on Oct 10

Upvote

Authors:

Ruocheng Wang ,

Abstract

Piano playing requires agile, precise, and coordinated hand control that stretches the limits of dexterity. Hand motion models with the sophistication to accurately recreate piano playing have a wide range of applications in character animation, embodied AI, biomechanics, and VR/AR. In this paper, we construct a first-of-its-kind large-scale dataset that contains approximately 10 hours of 3D hand motion and audio from 15 elite-level pianists playing 153 pieces of classical music. To capture natural performances, we designed a markerless setup in which motions are reconstructed from multi-view videos using state-of-the-art pose estimation models. The motion data is further refined via inverse kinematics using the high-resolution MIDI key-pressing data obtained from sensors in a specialized Yamaha Disklavier piano. Leveraging the collected dataset, we developed a pipeline that can synthesize physically-plausible hand motions for musical scores outside of the dataset. Our approach employs a combination of imitation learning and reinforcement learning to obtain policies for physics-based bimanual control involving the interaction between hands and piano keys. To solve the sampling efficiency problem with the large motion dataset, we use a diffusion model to generate natural reference motions, which provide high-level trajectory and fingering (finger order and placement) information. However, the generated reference motion alone does not provide sufficient accuracy for piano performance modeling. We then further augmented the data by using musical similarity to retrieve similar motions from the captured dataset to boost the precision of the RL policy. With the proposed method, our model generates natural, dexterous motions that generalize to music from outside the training dataset.

View arXiv page View PDF Add to collection

Community

akhaliq

Paper submitter 2 days ago

https://for-elise.github.io/

ddiddi

2 days ago

Nicely done!

fayezsalka

1 day ago

A work similar to this has been done on a commercial level as part of “Concert Creator” software around 4 years ago. Whereas ML models were trained on approximately 20 hours of motion capture data by professional pianist.

Results can be found here (model result on test set, includes examples of complex two hands interactions):

https://youtu.be/p8xRRV8Usg0?si=2oJuNUnpQucaFNID

You can also search for Concert Creator AI on youtube.