buyun ryanmiao commited on
Commit
4eb45cc
·
verified ·
1 Parent(s): 6efa094

Update README.md (#1)

Browse files

- Update README.md (1ecc90bad9207f97244138a07c002290f26f66a7)


Co-authored-by: mrh <[email protected]>

Files changed (1) hide show
  1. README.md +1 -6
README.md CHANGED
@@ -4,14 +4,9 @@ license: apache-2.0
4
  # Step-Audio-Tokenizer
5
 
6
 
7
- Step-Audio LLM是业界首个拥有1300亿参数的类人化统一端到端模型,整合了多模态语音理解与生成能力,涵盖歌声合成、工具调用、角色扮演及多语言/方言理解与合成等功能。
8
-
9
  Step-Audio LLM is the industry’s first 130-billion parameter hu-manlike unified end-to-end model that integrates multimodal speech un-derstanding and generation capabilities, including singing voice synthesis, tool utilization, role-play and multilingual/dialectal comprehension and synthesis.
10
 
11
- 本仓库提供 Step-Audio LLM的speech tokenizer模块。针对linguistic tokenization,我们采用 Paraformer 编码器的输出特征,将其量化至离散表示,码率为 16.7 Hz;针对semantic tokenization,我们使用 CosyVoice 的tokenizer——专为高效编码、自然且富有表现力的语音输出设计,码率为 25 Hz。
12
-
13
  This repository provides the speech tokenizer component of Step-Audio LLM. For linguistic tokenization, we utilize the output from the Paraformer encoder, which is quantized into discrete representations at a token rate of 16.7 Hz. For semantic tokenization, we employ CosyVoice’s tokenizer, specifically designed to efficiently encode features essential for generating natural and expressive speech outputs, operating at a token rate of 25 Hz.
14
 
15
- 更多信息请参考我们的仓库: [Step-Audio](https://github.com/stepfun-ai/Step-Audio).
16
-
17
  For more information, please refer to our repository: [Step-Audio](https://github.com/stepfun-ai/Step-Audio).
 
4
  # Step-Audio-Tokenizer
5
 
6
 
 
 
7
  Step-Audio LLM is the industry’s first 130-billion parameter hu-manlike unified end-to-end model that integrates multimodal speech un-derstanding and generation capabilities, including singing voice synthesis, tool utilization, role-play and multilingual/dialectal comprehension and synthesis.
8
 
 
 
9
  This repository provides the speech tokenizer component of Step-Audio LLM. For linguistic tokenization, we utilize the output from the Paraformer encoder, which is quantized into discrete representations at a token rate of 16.7 Hz. For semantic tokenization, we employ CosyVoice’s tokenizer, specifically designed to efficiently encode features essential for generating natural and expressive speech outputs, operating at a token rate of 25 Hz.
10
 
11
+ ## More information
 
12
  For more information, please refer to our repository: [Step-Audio](https://github.com/stepfun-ai/Step-Audio).