kxxia
/

ISCSLP2024_CoVoC_basemodel

Model card Files Files and versions Community

ISCSLP2024_CoVoC_basemodel / README.md

kxxia's picture

Update README.md

78f358a verified 9 months ago

|

history blame contribute delete

2.64 kB

	---
	license: cc-by-nc-sa-4.0
	extra_gated_prompt: >-
	The Models are available for download for non-commercial purposes . Terms of
	Access: The researcher has requested permission to use the models. In
	exchange for such permission, the researcher hereby agrees to the following
	terms and conditions:

	1. Researcher shall use the models only for non-commercial research and
	educational purposes.

	2. The authors make no representations or warranties regarding the models,
	including but not limited to warranties of non-infringement or fitness for a
	particular purpose.

	3. Researcher accepts full responsibility for his or her use of the models and
	shall defend and indemnify the authors of the models, including their
	employees, Trustees, officers and agents, against any and all claims arising
	from Researcher's use of the models, including but not limited to Researcher's
	use of any copies of copyrighted models files that he or she may create from
	the models.

	4.Researcher may provide research associates and colleagues with access to the
	models provided that they first agree to be bound by these terms and
	conditions.

	5. The authors reserve the right to terminate Researcher's access to the
	models at any time.

	6. If Researcher is employed by a for-profit, commercial entity, Researcher's
	employer shall also be bound by these terms and conditions, and Researcher
	hereby represents that he or she is fully authorized to enter into this
	agreement on behalf of such employer.
	extra_gated_fields:
	Name: text
	Email: text
	Organization: text
	Address: text
	I accept the terms of access: checkbox
	datasets:
	- Wenetspeech4TTS/WenetSpeech4TTS
	language:
	- zh
	---

	# ISCSLP2024 Conversational Voice Clone Challenge(CoVoC) baseline model.

	There are two baseline models in this competition.
	## VALL-E:
	VALL-E is trained using [Amphion](https://github.com/open-mmlab/Amphion).

	First, training is performed on the Wenetspeech4TTS dataset, and the model weight is valle_base_model.bin.

	After that, fine-tuning is performed on the HQ-Conversations dataset, the model weight is valle_HQ-sft_model.bin.

	For specific inference code, please refer to [ISCSLP2024_CoVoC_baseline Github](https://github.com/xkx-hub/ISCSLP2024_CoVoC_baseline) for more details.

	## fish-speech:
	An open-source speech model, fish-speech, whose LLAMA and vits_decoder are fine-tuned using the HQ-Conversations dataset.

	The training follows the default configuration of fish-speech.

	For specific training code, please refer to [Fish Speech Github](https://github.com/fishaudio/fish-speech) for more details.