|
--- |
|
license: cc-by-nc-sa-4.0 |
|
extra_gated_prompt: >- |
|
The Models are available for download for non-commercial purposes . Terms of |
|
Access: The researcher has requested permission to use the models. In |
|
exchange for such permission, the researcher hereby agrees to the following |
|
terms and conditions: |
|
|
|
1. Researcher shall use the models only for non-commercial research and |
|
educational purposes. |
|
|
|
2. The authors make no representations or warranties regarding the models, |
|
including but not limited to warranties of non-infringement or fitness for a |
|
particular purpose. |
|
|
|
3. Researcher accepts full responsibility for his or her use of the models and |
|
shall defend and indemnify the authors of the models, including their |
|
employees, Trustees, officers and agents, against any and all claims arising |
|
from Researcher's use of the models, including but not limited to Researcher's |
|
use of any copies of copyrighted models files that he or she may create from |
|
the models. |
|
|
|
4.Researcher may provide research associates and colleagues with access to the |
|
models provided that they first agree to be bound by these terms and |
|
conditions. |
|
|
|
5. The authors reserve the right to terminate Researcher's access to the |
|
models at any time. |
|
|
|
6. If Researcher is employed by a for-profit, commercial entity, Researcher's |
|
employer shall also be bound by these terms and conditions, and Researcher |
|
hereby represents that he or she is fully authorized to enter into this |
|
agreement on behalf of such employer. |
|
extra_gated_fields: |
|
Name: text |
|
Email: text |
|
Organization: text |
|
Address: text |
|
I accept the terms of access: checkbox |
|
datasets: |
|
- Wenetspeech4TTS/WenetSpeech4TTS |
|
language: |
|
- zh |
|
--- |
|
|
|
# ISCSLP2024 Conversational Voice Clone Challenge(CoVoC) baseline model. |
|
|
|
There are two baseline models in this competition. |
|
## VALL-E: |
|
VALL-E is trained using [Amphion](https://github.com/open-mmlab/Amphion). |
|
|
|
First, training is performed on the Wenetspeech4TTS dataset, and the model weight is valle_base_model.bin. |
|
|
|
After that, fine-tuning is performed on the HQ-Conversations dataset, the model weight is valle_HQ-sft_model.bin. |
|
|
|
For specific inference code, please refer to [ISCSLP2024_CoVoC_baseline Github](https://github.com/xkx-hub/ISCSLP2024_CoVoC_baseline) for more details. |
|
|
|
## fish-speech: |
|
An open-source speech model, fish-speech, whose LLAMA and vits_decoder are fine-tuned using the HQ-Conversations dataset. |
|
|
|
The training follows the default configuration of fish-speech. |
|
|
|
For specific training code, please refer to [Fish Speech Github](https://github.com/fishaudio/fish-speech) for more details. |