Spaces:
Running
on
Zero
Running
on
Zero
<!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# Audio Diffusion | |
## Overview | |
[Audio Diffusion](https://github.com/teticio/audio-diffusion) by Robert Dargavel Smith. | |
Audio Diffusion leverages the recent advances in image generation using diffusion models by converting audio samples to | |
and from mel spectrogram images. | |
The original codebase of this implementation can be found [here](https://github.com/teticio/audio-diffusion), including | |
training scripts and example notebooks. | |
## Available Pipelines: | |
| Pipeline | Tasks | Colab | |
|---|---|:---:| | |
| [pipeline_audio_diffusion.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py) | *Unconditional Audio Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/audio_diffusion_pipeline.ipynb) | | |
## Examples: | |
### Audio Diffusion | |
```python | |
import torch | |
from IPython.display import Audio | |
from diffusers import DiffusionPipeline | |
device = "cuda" if torch.cuda.is_available() else "cpu" | |
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-256").to(device) | |
output = pipe() | |
display(output.images[0]) | |
display(Audio(output.audios[0], rate=mel.get_sample_rate())) | |
``` | |
### Latent Audio Diffusion | |
```python | |
import torch | |
from IPython.display import Audio | |
from diffusers import DiffusionPipeline | |
device = "cuda" if torch.cuda.is_available() else "cpu" | |
pipe = DiffusionPipeline.from_pretrained("teticio/latent-audio-diffusion-256").to(device) | |
output = pipe() | |
display(output.images[0]) | |
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) | |
``` | |
### Audio Diffusion with DDIM (faster) | |
```python | |
import torch | |
from IPython.display import Audio | |
from diffusers import DiffusionPipeline | |
device = "cuda" if torch.cuda.is_available() else "cpu" | |
pipe = DiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256").to(device) | |
output = pipe() | |
display(output.images[0]) | |
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) | |
``` | |
### Variations, in-painting, out-painting etc. | |
```python | |
output = pipe( | |
raw_audio=output.audios[0, 0], | |
start_step=int(pipe.get_default_steps() / 2), | |
mask_start_secs=1, | |
mask_end_secs=1, | |
) | |
display(output.images[0]) | |
display(Audio(output.audios[0], rate=pipe.mel.get_sample_rate())) | |
``` | |
## AudioDiffusionPipeline | |
[[autodoc]] AudioDiffusionPipeline | |
- all | |
- __call__ | |
## Mel | |
[[autodoc]] Mel | |