Chinese
kxxia's picture
Update README.md
78f358a verified
---
license: cc-by-nc-sa-4.0
extra_gated_prompt: >-
The Models are available for download for non-commercial purposes . Terms of
Access: The researcher has requested permission to use the models. In
exchange for such permission, the researcher hereby agrees to the following
terms and conditions:
1. Researcher shall use the models only for non-commercial research and
educational purposes.
2. The authors make no representations or warranties regarding the models,
including but not limited to warranties of non-infringement or fitness for a
particular purpose.
3. Researcher accepts full responsibility for his or her use of the models and
shall defend and indemnify the authors of the models, including their
employees, Trustees, officers and agents, against any and all claims arising
from Researcher's use of the models, including but not limited to Researcher's
use of any copies of copyrighted models files that he or she may create from
the models.
4.Researcher may provide research associates and colleagues with access to the
models provided that they first agree to be bound by these terms and
conditions.
5. The authors reserve the right to terminate Researcher's access to the
models at any time.
6. If Researcher is employed by a for-profit, commercial entity, Researcher's
employer shall also be bound by these terms and conditions, and Researcher
hereby represents that he or she is fully authorized to enter into this
agreement on behalf of such employer.
extra_gated_fields:
Name: text
Email: text
Organization: text
Address: text
I accept the terms of access: checkbox
datasets:
- Wenetspeech4TTS/WenetSpeech4TTS
language:
- zh
---
# ISCSLP2024 Conversational Voice Clone Challenge(CoVoC) baseline model.
There are two baseline models in this competition.
## VALL-E:
VALL-E is trained using [Amphion](https://github.com/open-mmlab/Amphion).
First, training is performed on the Wenetspeech4TTS dataset, and the model weight is valle_base_model.bin.
After that, fine-tuning is performed on the HQ-Conversations dataset, the model weight is valle_HQ-sft_model.bin.
For specific inference code, please refer to [ISCSLP2024_CoVoC_baseline Github](https://github.com/xkx-hub/ISCSLP2024_CoVoC_baseline) for more details.
## fish-speech:
An open-source speech model, fish-speech, whose LLAMA and vits_decoder are fine-tuned using the HQ-Conversations dataset.
The training follows the default configuration of fish-speech.
For specific training code, please refer to [Fish Speech Github](https://github.com/fishaudio/fish-speech) for more details.