Feature Extraction
music
sander-wood commited on
Commit
8c76b80
·
verified ·
1 Parent(s): 791815a

Upload 2 files

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +119 -3
  3. overview.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ overview.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,119 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
2
+
3
+ <p align="center">
4
+ <img src="overview.png" alt="CLaMP 3 Overview" width="50%">
5
+ </p>
6
+
7
+
8
+ ## Overview
9
+ CLaMP 3 is a unified framework for cross-modal and cross-lingual music information retrieval (MIR). By using contrastive learning, it aligns sheet music, audio, performance signals, and multilingual text into a shared representation space, enabling retrieval across unaligned musical modalities. Key features include:
10
+
11
+ - **Multimodal Support:**
12
+ 1. **Sheet Music:** Uses Interleaved ABC notation.
13
+ 2. **Performance Signals:** Processes MIDI Text Format (MTF) data.
14
+ 3. **Audio Recordings:** Works with audio features extracted by MERT.
15
+
16
+ - **Multilingual Capabilities:** Supports 100 languages ([XLM-R](https://arxiv.org/abs/1911.02116)) and generalizes effectively beyond its 27-language training data.
17
+
18
+ - **Dataset and Benchmark:**
19
+ - Trained on **M4-RAG**, a large-scale dataset of 2.31M high-quality music-text pairs across 27 languages and 194 countries.
20
+ - Introduces **WikiMT-X**, a benchmark containing 1,000 triplets of sheet music, audio, and text.
21
+
22
+ CLaMP 3 achieves state-of-the-art performance across multiple MIR tasks, advancing research in multimodal and multilingual music systems.
23
+
24
+ ### Links
25
+ - CLaMP 3 Demo Page (Coming Soon...)
26
+ - CLaMP 3 Paper (Coming Soon...)
27
+ - [CLaMP 3 Code](https://github.com/sanderwood/clamp3)
28
+ - [CLaMP 3 Model Weights](https://huggingface.co/sander-wood/clamp3/tree/main)
29
+ - [M4-RAG Pre-training Dataset](https://huggingface.co/datasets/sander-wood/m4-rag)
30
+ - [WikiMT-X Evaluation Benchmark](https://huggingface.co/datasets/sander-wood/wikimt-x)
31
+
32
+ > **Note** Ensure the model weights for CLaMP 3 are placed under the `code/` folder for proper loading. Also, verify that the configuration hyperparameters are correctly set.
33
+
34
+ ## Repository Structure
35
+ - [code/](https://github.com/sanderwood/clamp3/tree/main/code): Contains scripts for training CLaMP 3 and extracting features from music and text data. You can modify hyperparameters and file paths in the configuration files for custom training.
36
+
37
+ - [classification/](https://github.com/sanderwood/clamp3/tree/main/classification): Includes scripts for classification tasks using extracted features, such as training linear classification models and making predictions.
38
+
39
+ - [preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing): Scripts for converting your data into compatible formats (interleaved ABC notation, MTF, or MERT-extracted audio features). These are required for CLaMP 3 to work with the data.
40
+
41
+ - [retrieval/](https://github.com/sanderwood/clamp3/tree/main/retrieval): Provides scripts for evaluating model performance, conducting semantic searches, and calculating similarity metrics based on extracted feature vectors.
42
+
43
+ > **Note** For detailed instructions on how to use the scripts in each folder, please refer to the individual README files within those directories. This main README provides only a high-level overview of the repository.
44
+
45
+ ## Getting Started
46
+
47
+ ### Environment Setup
48
+ To set up the environment for CLaMP 3, run the following commands:
49
+
50
+ ```bash
51
+ conda env create -f environment.yml
52
+ conda activate clamp3
53
+ ```
54
+
55
+ ### Data Preparation
56
+ 1. **Convert Files**: Navigate to the `preprocessing/` folder and convert your music files into a compatible format (interleaved ABC notation, MTF, or MERT-extracted audio features) suitable for use with CLaMP 3. Whether you are training or performing inference, **you must use these preprocessing scripts to ensure the data is in the correct format**.
57
+ 1. **Interleaved ABC Notation**:
58
+ - Convert MusicXML files to ABC using [batch_xml2abc.py](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_xml2abc.py).
59
+ - Process the ABC files into interleaved notation using [batch_interleaved_abc.py](https://github.com/sanderwood/clamp3/blob/main/preprocessing/abc/batch_interleaved_abc.py).
60
+ 2. **MTF**:
61
+ - Convert MIDI files to MTF format using [batch_midi2mtf.py](https://github.com/sanderwood/clamp3/blob/main/preprocessing/midi/batch_midi2mtf.py).
62
+ 3. **MERT-extracted Audio Features**:
63
+ - Extract audio features using MERT by running the scripts [extract_mert.py](https://github.com/sanderwood/clamp3/tree/main/preprocessing/audio/extract_mert.py). These features will be saved as `.npy` files and are ready for use in CLaMP 3.
64
+
65
+ 2. **Prepare Text Metadata (Optional)**: If you plan to train the model, you will need to prepare corresponding metadata for each music file. The metadata should be in JSON format, containing details like title, artist, region, language, and description.
66
+
67
+ Example:
68
+ ```json
69
+ {
70
+ "filepaths": ["audio/--/---aL9TdeI4.npy"],
71
+ "id": "---aL9TdeI4",
72
+ "title": "Mairi's Wedding",
73
+ "artists": ["Noel McLoughlin"],
74
+ "region": "United Kingdom of Great Britain and Northern Ireland",
75
+ "language": "English",
76
+ "genres": ["Folk", "Traditional"],
77
+ "tags": ["Scottish", "Wedding", "Traditional", "Folk", "Celtic"],
78
+ "background": "Mairi's Wedding is a Scottish folk song...",
79
+ "analysis": "The song has a lively and upbeat Scottish folk rhythm...",
80
+ "description": "A traditional folk song with a joyful celebration...",
81
+ "scene": "The setting is a picturesque Scottish village on a sunny morning...",
82
+ "translations": { "language": "Vietnamese", "background": "Bài hát \"Đám Cưới Mairi\"..." }
83
+ }
84
+ ```
85
+
86
+ Once your JSON files are ready, merge them into a single `.jsonl` file and structure the directories as shown:
87
+
88
+ ```
89
+ /your-data-folder/
90
+ ├── abc/
91
+ ├── audio/
92
+ ├── mtf/
93
+ ├── merged_output.jsonl
94
+ ```
95
+
96
+ ### Training and Feature Extraction
97
+ 2. **Training Models**: If you want to train CLaMP 3, check the training scripts in the [code/](https://github.com/sanderwood/clamp3/tree/main/code) folder and modify the [config.py](https://github.com/sanderwood/clamp3/blob/main/code/config.py) file to set your hyperparameters and data paths.
98
+
99
+ 3. **Extracting Features**: After training (or if you have pre-trained weights), extract features from **preprocessed** data using [extract_clamp3.py](https://github.com/sanderwood/clamp3/blob/main/code/extract_clamp3.py). The script automatically detects the modality based on the file extension (e.g., `.txt`, `.abc`, `.mtf`, `.npy`). Make sure your data has already been converted into CLaMP 3–compatible formats by following the scripts in the [preprocessing/](https://github.com/sanderwood/clamp3/tree/main/preprocessing)` folder.
100
+
101
+ ### Classification and Retrieval
102
+ 4. **Classification**: To perform classification on the extracted features, navigate to the [classification/](https://github.com/sanderwood/clamp3/tree/main/classification) directory. You’ll find scripts for training and making predictions using linear classification models.
103
+
104
+ 5. **Semantic Search**: To conduct semantic searches using the extracted features, refer to the scripts in the [retrieval/](https://github.com/sanderwood/clamp3/tree/main/retrieval) folder.
105
+
106
+ ## Citation
107
+ Coming Soon...
108
+ <!-- If you use CLaMP 3, M4-RAG, or WikiMT-X in your research, please cite the following paper:
109
+
110
+ bibtex
111
+ @misc{wu2024clamp2multimodalmusic,
112
+ title={CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models},
113
+ author={Shangda Wu and Yashan Wang and Ruibin Yuan and Zhancheng Guo and Xu Tan and Ge Zhang and Monan Zhou and Jing Chen and Xuefeng Mu and Yuejie Gao and Yuanliang Dong and Jiafeng Liu and Xiaobing Li and Feng Yu and Maosong Sun},
114
+ year={2024},
115
+ eprint={2410.13267},
116
+ archivePrefix={arXiv},
117
+ primaryClass={cs.SD},
118
+ url={https://arxiv.org/abs/2410.13267},
119
+ } -->
overview.png ADDED

Git LFS Details

  • SHA256: 2f44abe8ab7c9cdc2ef822e454f993bfaf8336afaf6335c98ac2be5b598f147c
  • Pointer size: 131 Bytes
  • Size of remote file: 611 kB