google
/

music-spectrogram-diffusion

SpectrogramDiffusionPipeline

Model card Files Files and versions Community

kashif HF Staff commited on Mar 21, 2023

Commit

1d9603c

·

1 Parent(s): e98947a

Update README.md

added model section

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -16,6 +16,10 @@ An ideal music synthesizer should be both interactive and expressive, generating
 <img src="https://storage.googleapis.com/music-synthesis-with-spectrogram-diffusion/architecture.png" alt="Architecture diagram">
 ## Example usage
 ```python

 <img src="https://storage.googleapis.com/music-synthesis-with-spectrogram-diffusion/architecture.png" alt="Architecture diagram">
+## Model
+As depicted above the model takes as input a MIDI file and tokenizes it into a sequence of 5 second intervals. Each tokenized interval then together with positional encodings is passed through the Note Encoder and its representation is concatenated with the previous window's generated spectrogram representation obtained via the Context Encoder. For the initial 5 second window this is set to zero. The resulting context is then used as conditioning to sample the denoised Spectrogram from the MIDI window and we concatenate this spectrogram to the final output as well as use it for the context of the next MIDI window. The process repeats till we have gone over all the MIDI inputs. Finally a MelGAN decoder converts the potentially long spectrogram to audio which is the final result of this pipeline.
 ## Example usage
 ```python