Llambada-V0.1

This is version 0.1 of the model

Introduction

This is the official model page of the Llambada paper.

Some of demos of our model can be listened here: Llambada-demo

This model is trained on totally 4.4k music hours dataset with 2xA100 GPUS. The training cost for this model is about 720 USD in 5 days for 2 stages: the semantic stage and the coarse stage.

Hopefully, we want open the a.i for everyone, so all of the source code of the model, the training script, and the hyperparameters will be released :)

Model structure

semantic model: This is the model for the semantic stage, it is a the model for generate the middle representation in order to converse to the normal audio.
coarse model: This is the model that generate the acoustic tokens, which contains the main information of the audio.

Usage

To use the model, please refer to our public implementation Llambada repo, you will see the instructions for how to use it, and we include the implementation for the inference.

Note

This is the public checkpoint and we are not having for the responsibility if you use this checkpoint for some of conflict in the regulation of the creators. This release is for academic purpose.

Please contact us in the issue of our github repo if you have any confusing about this paper, model, or implementation.

Contact

If you have any further questions or having new ideas for the model features, you can contact us in songgen.ai and we can have support