English
music
soujanyaporia commited on
Commit
9ac44fa
1 Parent(s): 1473e5a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ datasets:
4
+ - declare-lab/TangoPromptBank
5
+ language:
6
+ - en
7
+ tags:
8
+ - music
9
+ ---
10
+ # TANGO: Text to Audio using iNstruction-Guided diffusiOn
11
+
12
+ **TANGO** is a latent diffusion model for text-to-audio generation. **TANGO** can generate realistic audios including human sounds, animal sounds, natural and artificial sounds and sound effects from textual prompts. We use the frozen instruction-tuned LLM Flan-T5 as the text encoder and train a UNet based diffusion model for audio generation. We outperform current state-of-the-art models for audio generation across both objective and subjective metrics. We release our model, training, inference code and pre-trained checkpoints for the research community.
13
+
14
+ 📣 We are releasing **Tango-Full-FT-Audio-Music-Caps** which was first pre-trained on **TangoPromptBank** and later fine tuned on AudioCaps and MusicCaps.
15
+
16
+ ## Code
17
+
18
+ Our code is released here: [https://github.com/declare-lab/tango](https://github.com/declare-lab/tango)
19
+
20
+ We uploaded several **TANGO** generated samples here: [https://tango-web.github.io/](https://tango-web.github.io/)
21
+
22
+ Please follow the instructions in the repository for installation, usage and experiments.
23
+
24
+ ## Quickstart Guide
25
+
26
+ Download the **TANGO** model and generate audio from a text prompt:
27
+
28
+ ```python
29
+ import IPython
30
+ import soundfile as sf
31
+ from tango import Tango
32
+
33
+ tango = Tango("declare-lab/tango-full-ft-audio-music-caps")
34
+
35
+ prompt = "An audience cheering and clapping"
36
+ audio = tango.generate(prompt)
37
+ sf.write(f"{prompt}.wav", audio, samplerate=16000)
38
+ IPython.display.Audio(data=audio, rate=16000)
39
+ ```
40
+ [An audience cheering and clapping.webm](https://user-images.githubusercontent.com/13917097/233851915-e702524d-cd35-43f7-93e0-86ea579231a7.webm)
41
+
42
+ The model will be automatically downloaded and saved in cache. Subsequent runs will load the model directly from cache.
43
+
44
+ The `generate` function uses 100 steps by default to sample from the latent diffusion model. We recommend using 200 steps for generating better quality audios. This comes at the cost of increased run-time.
45
+
46
+ ```python
47
+ prompt = "Rolling thunder with lightning strikes"
48
+ audio = tango.generate(prompt, steps=200)
49
+ IPython.display.Audio(data=audio, rate=16000)
50
+ ```
51
+ [Rolling thunder with lightning strikes.webm](https://user-images.githubusercontent.com/13917097/233851929-90501e41-911d-453f-a00b-b215743365b4.webm)
52
+
53
+ <!-- [MachineClicking](https://user-images.githubusercontent.com/25340239/233857834-bfda52b4-4fcc-48de-b47a-6a6ddcb3671b.mp4 "sample 1") -->
54
+
55
+ Use the `generate_for_batch` function to generate multiple audio samples for a batch of text prompts:
56
+
57
+ ```python
58
+ prompts = [
59
+ "This music is instrumental. The tempo is slow with an acoustic guitar harmony, loud electric guitar feedback and fiddle",
60
+ "This pop ballad features a female voice singing the main melody. This is accompanied by an acoustic guitar playing chords",
61
+ "A lady is singing together with a kid"
62
+ ]
63
+ audios = tango.generate_for_batch(prompts, samples=2)
64
+ ```
65
+ This will generate two samples for each of the three text prompts.