RoyalCities commited on
Commit
4945fc0
·
verified ·
1 Parent(s): a03d4e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +243 -5
README.md CHANGED
@@ -1,5 +1,243 @@
1
- ---
2
- license: other
3
- license_name: stabilityai-community-license
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - audio
5
+ - music-generation
6
+ - sample-generation
7
+ - piano
8
+ - fine-tuning
9
+ - stable-audio
10
+ datasets:
11
+ - custom
12
+ model_name: Royal Cities Vocal Textures (SAO Finetune)
13
+ base_model: stabilityai/stable-audio-open-1.0
14
+ license: other
15
+ license_name: stabilityai-community-license
16
+ license_link: https://stability.ai/license
17
+ ---
18
+
19
+
20
+ <center><img src="https://i.imgur.com/MJvcnnn.jpeg" alt="Header Logo" width="100%"></center>
21
+
22
+ <center>
23
+ <h2 style="font-size: 30px;"><u>Royal Cities Vocal Textures (SAO Finetune)</u></h2>
24
+ </center>
25
+ <center>
26
+ <h2 style="font-size: 19px;">Introduction</h2>
27
+ </center>
28
+ This finetuned Stable Audio Open model specializes in Vocal / Operatic Chord Progressions to support granular music production workflows. Capable of creating an infinite variety of chord progressions, all output is BPM-synced and key-locked to any note within the 12-tone chromatic scale, in both major and minor keys. This model was trained on a custom dataset crafted within FL Studio and features three distinct voicings:
29
+
30
+ - **Male Vocals**
31
+ - **Female Vocals**
32
+ - **Ensemble Vocals** (Combination of Male and Female)
33
+
34
+
35
+ <center>
36
+ <h2 style="font-size: 19px;">Model Features</h2>
37
+ </center>
38
+
39
+ - **Multiple Types of Stem Generation:** Outputs three types of voicings with a focus on Chord Progressions only,
40
+ - **Tonal Versatility:** Generates stems in any key across the 12-tone chromatic scale, in both major and minor scales.
41
+ - **Simplified Scale Notation:** Scales are written using <b><i>sharps only</i></b> in the following format:
42
+
43
+ <pre>
44
+ <b>Minor Scales</b>
45
+ A minor, A# minor, B minor, C minor, C# minor, D minor, D# minor,
46
+ E minor, F minor, F# minor, G minor, G# minor
47
+
48
+ <b>Major Scales</b>
49
+ A major, A# major, B major, C major, C# major, D major, D# major,
50
+ E major, F major, F# major, G major, G# major
51
+ </pre>
52
+
53
+
54
+ For more details on the VSTs and gear used in sample creation, refer to the Gear section below.
55
+
56
+ <center>
57
+ <h2 style="font-size: 19px;">Training Methodology</h2>
58
+ </center>
59
+
60
+ This model was designed in conjunction with a future public dataset with a focus on only generating vocal chord progressions across the 12-tone chromatic scale in either Female, Male or Ensemble outputs.
61
+
62
+ Vocals / Choirs have very long attacks so it was trained with metadata simply classifying each sample as "Chord Progression" along with the necessary key, bpm and bar information. More details can be found in the dataset section.
63
+
64
+ <center>
65
+ <h2 style="font-size: 24px;"><u>Usage Guide</u></h2>
66
+ </center>
67
+
68
+ <center>
69
+ <h2 style="font-size: 19px;">Supported GitHub Interfaces</h2>
70
+ <p style="font-size: 16px;">
71
+ This model works in both the <a href="https://github.com/RoyalCities/RC-stable-audio-tools">RC Stable Audio Gradio</a> and the <a href="https://github.com/Stability-AI/stable-audio-tools">original Stable Audio GitHub</a>.
72
+ </p>
73
+ </center>
74
+
75
+
76
+ <b>
77
+ <p align="center" style="font-size: 18px;">
78
+ To use the model simply place both the .ckpt file and the config .json inside their own sub-folder within the "models" folder and launch the gradio.
79
+ </b>
80
+
81
+ <center>
82
+ <h2 style="font-size: 24px;"><u>VST Support</u></h2>
83
+ </center>
84
+
85
+ <b>
86
+ <p align="center" style="font-size: 18px;">
87
+ This model has direct VST compatibility in the <a href="https://audialab.com/products/deep-sampler-2/" style="font-size: 20px;">Audialab Engine.</a>
88
+ </b>
89
+
90
+ <center>
91
+ <h2 style="font-size: 24px;"><u>Prompt Structure</u></h2>
92
+ </center>
93
+
94
+ To ensure the best results, use the following format for your prompts:
95
+ <pre><b>
96
+ [Vocal Type], Chord Progression, [Key], [BPM], [Bar Count]
97
+ </b></pre>
98
+
99
+ Vocal Type Prompts
100
+ ```python
101
+ ["Male Vocal Texture"]
102
+ ["Female Vocal Texture"]
103
+ ["Ensemble Vocal Texture"]
104
+ ```
105
+ <center>
106
+ <p style="font-size: 16px;"> <b>Examples</b>
107
+ </p>
108
+ </center>
109
+
110
+ ##### Male vocal prompt with model output.
111
+ "Male Vocal Texture, chord progression, D# minor, 128BPM, 8 bars,"
112
+ <audio controls src="https://huggingface.co/RoyalCities/Vocal_Textures_Main/resolve/main/example_1.mp3"></audio>
113
+
114
+ ##### Female vocal prompt with model output.
115
+ "Female Vocal Texture, chord progression, B major, 120BPM, 8 bars,"
116
+ <audio controls src="https://huggingface.co/RoyalCities/Vocal_Textures_Main/resolve/main/example_2.mp3"></audio>
117
+
118
+
119
+ ##### Ensemble Vocal Prompt with model output.
120
+ "Ensemble Vocal Texture, chord progression, C major, 140BPM, 8 bars,"
121
+ <audio controls src="https://huggingface.co/RoyalCities/Vocal_Textures_Main/resolve/main/example_3.mp3"></audio>
122
+
123
+ <center>
124
+ <p style="font-size: 16px;"> <b>BPMs/Bars</b>
125
+ </p>
126
+ </center>
127
+
128
+ The BPMs ranged from as low as 100BPM up to 150BPM. The main denominations are **100BPM, 110BPM, 120BPM, 128BPM, 130BPM, 140BPM, 150BPM**.
129
+
130
+ There are 2 bar settings: **4 bars** and **8 bars**.
131
+
132
+ <center>
133
+ <h2 style="font-size: 24px;"><u>Dataset Breakdown</u></h2>
134
+ </center>
135
+
136
+ <center>
137
+ <h2 style="font-size: 19px;">Overview</h2>
138
+ </center>
139
+
140
+ - **Total .wav files**: 5040
141
+ - **Total duration**: 980.64 minutes
142
+ - **Average Length**: 11.67 seconds
143
+ - **Total Size**: 14.50 GB
144
+ - **Sample Rate**: 44100 Hz
145
+
146
+ This dataset was meticulously designed to maintain a perfect balance between major and minor chord progressions, ensuring high accuracy in key generation.
147
+
148
+ By including an equal number of samples for all 12 tones in both major and minor scales (across 4 and 8 bars), the model achieves precise results. It reliably produces chord progressions in either major or related minor scales, effectively handling the nuances where major and minor scales share the same notes.
149
+
150
+ For example, in music theory, the F major scale and D minor scale share the same notes, but the tonal center of the chord progression drives the classification. After training, the model accurately generates the correct scale based solely on the prompt.
151
+
152
+ ##### "Male Vocal Texture, chord progression, D minor, 150BPM, 8 bars,"
153
+ <audio controls src="https://huggingface.co/RoyalCities/Vocal_Textures_Main/resolve/main/example_4a.mp3"></audio>
154
+
155
+ ##### "Male Vocal Texture, chord progression, F major, 150BPM, 8 bars,"
156
+ <audio controls src="https://huggingface.co/RoyalCities/Vocal_Textures_Main/resolve/main/example_4b.mp3"></audio>
157
+
158
+
159
+ <center>
160
+ <h2 style="font-size: 19px;">Dataset Details</h2>
161
+ </center>
162
+
163
+ <table align="center" style="width: 80%; border-collapse: collapse;">
164
+ <thead>
165
+ <tr>
166
+ <th style="border: 1px solid black; padding: 8px;">Vocal Type</th>
167
+ <th style="border: 1px solid black; padding: 8px;">Major Progression (4 bars)</th>
168
+ <th style="border: 1px solid black; padding: 8px;">Major Progression (8 bars)</th>
169
+ <th style="border: 1px solid black; padding: 8px;">Minor Progression (4 bars)</th>
170
+ <th style="border: 1px solid black; padding: 8px;">Minor Progression (8 bars)</th>
171
+ </tr>
172
+ </thead>
173
+ <tbody>
174
+ <tr>
175
+ <td style="border: 1px solid black; padding: 8px; text-align: center;"><b>Male</b></td>
176
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
177
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
178
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
179
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
180
+ </tr>
181
+ <tr>
182
+ <td style="border: 1px solid black; padding: 8px; text-align: center;"><b>Female</b></td>
183
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
184
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
185
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
186
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
187
+ </tr>
188
+ <tr>
189
+ <td style="border: 1px solid black; padding: 8px; text-align: center;"><b>Ensemble</b></td>
190
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
191
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
192
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
193
+ <td style="border: 1px solid black; padding: 8px; text-align: center;">420</td>
194
+ </tr>
195
+ </tbody>
196
+ </table>
197
+
198
+ <center>
199
+ <p style="font-size: 16px;"> <b>The dataset will be released in the future for researchers and music producers who want to get started in finetuning. </b>
200
+ </p>
201
+ </center>
202
+
203
+ <center>
204
+ <h2 style="font-size: 19px;">Technical Specifications</h2>
205
+ </center>
206
+
207
+ - **Platform**: Runpod
208
+ - **Monitoring Tool**: Weights and Biases
209
+ - **Epoch**: 23
210
+ - **Steps**: 1800
211
+ - **Learning Rate**: 5e-5
212
+ - **Optimizer**: AdamW
213
+ - **Scheduler**: InverseLR
214
+ - **Batch Size**: 32
215
+ - **Hardware**: 2x NVIDIA A40 GPUs
216
+
217
+ See config file for further details.
218
+
219
+ <center>
220
+ <h2 style="font-size: 24px;"><u>Limitations and Biases</u></h2>
221
+ </center>
222
+
223
+ The Model has high accuracy when it comes to staying in key due to the balance in the dataset. The metadata however was designed in such a way that the model is mainly designed to generate chord progressions only - as opposed to the piano model which also generated melodies. This is due to Vocal choir samples often having very long attacks so often sit at the back of a mix or to fill out the frequency space.
224
+
225
+ I have noticed some light artifacting in the outputs - in particular the ensemble progressions. It almost sounds like the model is trying to add other instrumentation in. I think this may be due to the base model primarily being trained on music stems rather than vocals but I cannot say for certain - this will need to be corrected in a future model or when there is far more vocal data.
226
+
227
+ Best use case may be to add some light reverb or post-processing for best results / use in a song.
228
+
229
+ <center>
230
+ <h2 style="font-size: 24px;"><u>Gear Used</u></h2>
231
+ </center>
232
+
233
+ - **DAW:** FL Studio (Image-Line)
234
+ - **Vocals:** Multiple Choir Libraries
235
+ - **EQ:** PRO-Q3 (FabFilter)
236
+ - **Additional Gear:** Fruity Reeverb 2, Shaperbox 3 (Compressor), Soothe2
237
+
238
+ <center>
239
+ <h2 style="font-size: 24px;"><u>License</u></h2>
240
+ </center>
241
+
242
+
243
+ This model is licensed under the Stability AI Community License. It is available for non-commercial use or limited commercial use by entities with annual revenues below USD $1M. For revenues exceeding USD $1M, please refer to the [LICENSE](./LICENSE.md) for detailed terms.