Optical Music Recognition Transformer
Image-To-Text model for optical music recognition. The model is trained to predict simple notes in the LilyPond format from a given image. Training data consists of artificial, handwritten and white board images. The model itself is based on Donut.
Demo
Prediction: c'2 a''8 c''8 r4 c'1 e'8 c'8 c'8 a''8 f'4 a'8 c'8
Prediction: d'8 g'8 c''8 a'8 d'2 c'8 f''8 d'4 c''4 e'8 r8 g'8 b'8 e'8 g'8 d'2
Prediction: g'4 c'4 r8 f''8 e'8 d'8 r8 c'4 c'2 a'2 b'4 r4 a'8 r8 r4
Repo: https://github.com/UHHRobotics22-23/robot_project/tree/main/marimbabot_vision
- Downloads last month
- 495
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.