Spaces:

teticio
/

audio-diffusion

Runtime error

teticio commited on Aug 16, 2022

Commit

72c877e

1 Parent(s): b929114

add gradio app

Files changed (4) hide show

README.md CHANGED Viewed

@@ -6,7 +6,7 @@
 ![mel spectrogram](mel.png)
-Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice-versa. The higher the resolution, the less audio information will be lost. You can see how this works in the `test-mel.ipynb` notebook.
 A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.

 ![mel spectrogram](mel.png)
+Audio can be represented as images by transforming to a [mel spectrogram](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), such as the one shown above. The class `Mel` in `mel.py` can convert a slice of audio into a mel spectrogram of `x_res` x `y_res` and vice versa. The higher the resolution, the less audio information will be lost. You can see how this works in the `test-mel.ipynb` notebook.
 A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.

app.py ADDED Viewed

+import argparse
+import gradio as gr
+from PIL import Image
+from diffusers import DDPMPipeline
+from src.mel import Mel
+mel = Mel(x_res=256, y_res=256)
+model_id = "teticio/audio-diffusion-256"
+ddpm = DDPMPipeline.from_pretrained(model_id)
+def generate_spectrogram_and_audio():
+    images = ddpm(output_type="numpy")["sample"]
+    images = (images * 255).round().astype("uint8").transpose(0, 3, 1, 2)
+    image = Image.fromarray(images[0][0])
+    audio = mel.image_to_audio(image)
+    return image, (mel.get_sample_rate(), audio)
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int)
+    parser.add_argument("--server", type=int)
+    args = parser.parse_args()
+    demo = gr.Interface(
+        fn=generate_spectrogram_and_audio,
+        title="Audio Diffusion",
+        description=f"Generate audio using Huggingface diffusers",
+        inputs=[],
+        outputs=[
+            gr.Image(label="Mel spectrogram", image_mode="L"),
+            gr.Audio(label="Audio"),
+        ],
+    )
+    demo.launch(server_name=args.server or "0.0.0.0", server_port=args.port)

notebooks/test-model.ipynb CHANGED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt CHANGED Viewed

@@ -1,8 +1,5 @@
-torch==1.12.1
-torchvision==0.13.1
-numpy==1.22.4
-Pillow==9.2.0
-accelerate==0.12.0
-datasets==2.4.0
-diffusers==0.1.3
-tqdm==4.64.0

+# for Hugging Face spaces
+torch
+numpy
+Pillow
+diffusers