Upload 2 files
Browse files- .gitattributes +1 -0
- README.md +119 -3
- overview.png +3 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
overview.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,3 +1,119 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
|
2 |
+
|
3 |
+
<p align="center">
|
4 |
+
<img src="overview.png" alt="CLaMP 3 Overview" width="50%">
|
5 |
+
</p>
|
6 |
+
|
7 |
+
|
8 |
+
## Overview
|
9 |
+
CLaMP 3 is a unified framework for cross-modal and cross-lingual music information retrieval (MIR). By using contrastive learning, it aligns sheet music, audio, performance signals, and multilingual text into a shared representation space, enabling retrieval across unaligned musical modalities. Key features include:
|
10 |
+
|
11 |
+
- **Multimodal Support:**
|
12 |
+
1. **Sheet Music:** Uses Interleaved ABC notation.
|
13 |
+
2. **Performance Signals:** Processes MIDI Text Format (MTF) data.
|
14 |
+
3. **Audio Recordings:** Works with audio features extracted by MERT.
|
15 |
+
|
16 |
+
- **Multilingual Capabilities:** Supports 100 languages ([XLM-R](https://arxiv.org/abs/1911.02116)) and generalizes effectively beyond its 27-language training data.
|
17 |
+
|
18 |
+
- **Dataset and Benchmark:**
|
19 |
+
- Trained on **M4-RAG**, a large-scale dataset of 2.31M high-quality music-text pairs across 27 languages and 194 countries.
|
20 |
+
- Introduces **WikiMT-X**, a benchmark containing 1,000 triplets of sheet music, audio, and text.
|
21 |
+
|
22 |
+
CLaMP 3 achieves state-of-the-art performance across multiple MIR tasks, advancing research in multimodal and multilingual music systems.
|
23 |
+
|
24 |
+
### Links
|
25 |
+
- CLaMP 3 Demo Page (Coming Soon...)
|
26 |
+
- CLaMP 3 Paper (Coming Soon...)
|
27 |
+
- [CLaMP 3 Code](https://github.com/sanderwood/clamp3)
|
28 |
+
- [CLaMP 3 Model Weights](https://huggingface.co/sander-wood/clamp3/tree/main)
|
29 |
+
- [M4-RAG Pre-training Dataset](https://huggingface.co/datasets/sander-wood/m4-rag)
|
30 |
+
- [WikiMT-X Evaluation Benchmark](https://huggingface.co/datasets/sander-wood/wikimt-x)
|
31 |
+
|
32 |
+
> **Note** Ensure the model weights for CLaMP 3 are placed under the `code/` folder for proper loading. Also, verify that the configuration hyperparameters are correctly set.
|
33 |
+
|
34 |
+
## Repository Structure
|
35 |
+
- [code/](https://github.com/sanderwood/clamp3/tree/main/code): Contains scripts for training CLaMP 3 and extracting features from music and text data. You can modify hyperparameters and file paths in the configuration files for custom training.
|
36 |
+
|
37 |
+
- [classification/](https://github.com/sanderwood/clamp3/tree/main/classification): Includes scripts for classification tasks using extracted features, such as training linear classification models and making predictions.
|
38 |
+
|
39 |
+
- [preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing): Scripts for converting your data into compatible formats (interleaved ABC notation, MTF, or MERT-extracted audio features). These are required for CLaMP 3 to work with the data.
|
40 |
+
|
41 |
+
- [retrieval/](https://github.com/sanderwood/clamp3/tree/main/retrieval): Provides scripts for evaluating model performance, conducting semantic searches, and calculating similarity metrics based on extracted feature vectors.
|
42 |
+
|
43 |
+
> **Note** For detailed instructions on how to use the scripts in each folder, please refer to the individual README files within those directories. This main README provides only a high-level overview of the repository.
|
44 |
+
|
45 |
+
## Getting Started
|
46 |
+
|
47 |
+
### Environment Setup
|
48 |
+
To set up the environment for CLaMP 3, run the following commands:
|
49 |
+
|
50 |
+
```bash
|
51 |
+
conda env create -f environment.yml
|
52 |
+
conda activate clamp3
|
53 |
+
```
|
54 |
+
|
55 |
+
### Data Preparation
|
56 |
+
1. **Convert Files**: Navigate to the `preprocessing/` folder and convert your music files into a compatible format (interleaved ABC notation, MTF, or MERT-extracted audio features) suitable for use with CLaMP 3. Whether you are training or performing inference, **you must use these preprocessing scripts to ensure the data is in the correct format**.
|
57 |
+
1. **Interleaved ABC Notation**:
|
58 |
+
- Convert MusicXML files to ABC using [batch_xml2abc.py](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_xml2abc.py).
|
59 |
+
- Process the ABC files into interleaved notation using [batch_interleaved_abc.py](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_interleaved_abc.py).
|
60 |
+
2. **MTF**:
|
61 |
+
- Convert MIDI files to MTF format using [batch_midi2mtf.py](https://github.com/sanderwood/clamp3/blob/main/preprocessing/midi/batch_midi2mtf.py).
|
62 |
+
3. **MERT-extracted Audio Features**:
|
63 |
+
- Extract audio features using MERT by running the scripts [extract_mert.py](https://github.com/sanderwood/clamp3/tree/main/preprocessing/audio/extract_mert.py). These features will be saved as `.npy` files and are ready for use in CLaMP 3.
|
64 |
+
|
65 |
+
2. **Prepare Text Metadata (Optional)**: If you plan to train the model, you will need to prepare corresponding metadata for each music file. The metadata should be in JSON format, containing details like title, artist, region, language, and description.
|
66 |
+
|
67 |
+
Example:
|
68 |
+
```json
|
69 |
+
{
|
70 |
+
"filepaths": ["audio/--/---aL9TdeI4.npy"],
|
71 |
+
"id": "---aL9TdeI4",
|
72 |
+
"title": "Mairi's Wedding",
|
73 |
+
"artists": ["Noel McLoughlin"],
|
74 |
+
"region": "United Kingdom of Great Britain and Northern Ireland",
|
75 |
+
"language": "English",
|
76 |
+
"genres": ["Folk", "Traditional"],
|
77 |
+
"tags": ["Scottish", "Wedding", "Traditional", "Folk", "Celtic"],
|
78 |
+
"background": "Mairi's Wedding is a Scottish folk song...",
|
79 |
+
"analysis": "The song has a lively and upbeat Scottish folk rhythm...",
|
80 |
+
"description": "A traditional folk song with a joyful celebration...",
|
81 |
+
"scene": "The setting is a picturesque Scottish village on a sunny morning...",
|
82 |
+
"translations": { "language": "Vietnamese", "background": "Bài hát \"Đám Cưới Mairi\"..." }
|
83 |
+
}
|
84 |
+
```
|
85 |
+
|
86 |
+
Once your JSON files are ready, merge them into a single `.jsonl` file and structure the directories as shown:
|
87 |
+
|
88 |
+
```
|
89 |
+
/your-data-folder/
|
90 |
+
├── abc/
|
91 |
+
├── audio/
|
92 |
+
├── mtf/
|
93 |
+
├── merged_output.jsonl
|
94 |
+
```
|
95 |
+
|
96 |
+
### Training and Feature Extraction
|
97 |
+
2. **Training Models**: If you want to train CLaMP 3, check the training scripts in the [code/](https://github.com/sanderwood/clamp3/tree/main/code) folder and modify the [config.py](https://github.com/sanderwood/clamp3/blob/main/code/config.py) file to set your hyperparameters and data paths.
|
98 |
+
|
99 |
+
3. **Extracting Features**: After training (or if you have pre-trained weights), extract features from **preprocessed** data using [extract_clamp3.py](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py). The script automatically detects the modality based on the file extension (e.g., `.txt`, `.abc`, `.mtf`, `.npy`). Make sure your data has already been converted into CLaMP 3–compatible formats by following the scripts in the [preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing)` folder.
|
100 |
+
|
101 |
+
### Classification and Retrieval
|
102 |
+
4. **Classification**: To perform classification on the extracted features, navigate to the [classification/](https://github.com/sanderwood/clamp3/tree/main/classification) directory. You’ll find scripts for training and making predictions using linear classification models.
|
103 |
+
|
104 |
+
5. **Semantic Search**: To conduct semantic searches using the extracted features, refer to the scripts in the [retrieval/](https://github.com/sanderwood/clamp3/tree/main/retrieval) folder.
|
105 |
+
|
106 |
+
## Citation
|
107 |
+
Coming Soon...
|
108 |
+
<!-- If you use CLaMP 3, M4-RAG, or WikiMT-X in your research, please cite the following paper:
|
109 |
+
|
110 |
+
bibtex
|
111 |
+
@misc{wu2024clamp2multimodalmusic,
|
112 |
+
title={CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models},
|
113 |
+
author={Shangda Wu and Yashan Wang and Ruibin Yuan and Zhancheng Guo and Xu Tan and Ge Zhang and Monan Zhou and Jing Chen and Xuefeng Mu and Yuejie Gao and Yuanliang Dong and Jiafeng Liu and Xiaobing Li and Feng Yu and Maosong Sun},
|
114 |
+
year={2024},
|
115 |
+
eprint={2410.13267},
|
116 |
+
archivePrefix={arXiv},
|
117 |
+
primaryClass={cs.SD},
|
118 |
+
url={https://arxiv.org/abs/2410.13267},
|
119 |
+
} -->
|
overview.png
ADDED
![]() |
Git LFS Details
|