Llambada-V0.1
This is version 0.1 of the model
Introduction
This is the official model page of the Llambada paper.
Some of demos of our model can be listened here: Llambada-demo
This model is trained on totally 4.4k music hours dataset with 2xA100 GPUS. The training cost for this model is about 720 USD in 5 days for 2 stages: the semantic stage and the coarse stage.
Hopefully, we want open the a.i for everyone, so all of the source code of the model, the training script, and the hyperparameters will be released :)
Model structure
semantic model
: This is the model for the semantic stage, it is a the model for generate the middle representation in order to converse to the normal audio.coarse model
: This is the model that generate the acoustic tokens, which contains the main information of the audio.
Usage
To use the model, please refer to our public implementation Llambada repo, you will see the instructions for how to use it, and we include the implementation for the inference.
Note
This is the public checkpoint and we are not having for the responsibility if you use this checkpoint for some of conflict in the regulation of the creators. This release is for academic purpose.
Please contact us in the issue of our github repo if you have any confusing about this paper, model, or implementation.
Contact
If you have any further questions or having new ideas for the model features, you can contact us in songgen.ai and we can have support