Commit
·
fa322ac
1
Parent(s):
aab4fd3
Update README.md
Browse files
README.md
CHANGED
@@ -21,11 +21,11 @@ If you wish to use baichuan-7B (for inference, finetuning, etc.), we recommend u
|
|
21 |
|
22 |
- 在同尺寸模型中baichuan-7B达到了目前SOTA的水平,参考下面MMLU指标
|
23 |
- baichuan-7B使用自有的中英文双语语料进行训练,在中文上进行优化,在C-Eval达到SOTA水平
|
24 |
-
- 不同于
|
25 |
|
26 |
- Among models of the same size, baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
|
27 |
- baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
|
28 |
-
- Unlike
|
29 |
|
30 |
## How to Get Started with the Model
|
31 |
|
@@ -68,7 +68,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
|
68 |
|
69 |
<!-- Provide the basic links for the model. -->
|
70 |
|
71 |
-
整体模型基于标准的Transformer结构,我们采用了和
|
72 |
- **Position Embedding**:采用rotary-embedding,是现阶段被大多数模型采用的位置编码方案,具有很好的外推性。
|
73 |
- **Feedforward Layer**:采用SwiGLU,Feedforward变化为(8/3)倍的隐含层大小,即11008。
|
74 |
- **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。
|
@@ -83,7 +83,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
|
|
83 |
| vocab size | 64000 |
|
84 |
| sequence length | 4096 |
|
85 |
|
86 |
-
The overall model is based on the standard Transformer structure, and we have adopted the same model design as
|
87 |
|
88 |
- Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
|
89 |
- Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
|
@@ -183,7 +183,7 @@ For specific training settings, please refer to [baichuan-7B](https://github.com
|
|
183 |
|
184 |
| Model | Average |
|
185 |
|-------------------------|-----------------|
|
186 |
-
| Open-
|
187 |
| Ziya-LLaMA-13B-pretrain | 27.64 |
|
188 |
| Falcon-7B | 27.18 |
|
189 |
| TigerBot-7B-base | 25.19 |
|
|
|
21 |
|
22 |
- 在同尺寸模型中baichuan-7B达到了目前SOTA的水平,参考下面MMLU指标
|
23 |
- baichuan-7B使用自有的中英文双语语料进行训练,在中文上进行优化,在C-Eval达到SOTA水平
|
24 |
+
- 不同于LLaMA完全禁止商业使用,baichuan-7B使用更宽松的开源协议,允许用于商业目的
|
25 |
|
26 |
- Among models of the same size, baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
|
27 |
- baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
|
28 |
+
- Unlike LLaMA, which completely prohibits commercial use, baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.
|
29 |
|
30 |
## How to Get Started with the Model
|
31 |
|
|
|
68 |
|
69 |
<!-- Provide the basic links for the model. -->
|
70 |
|
71 |
+
整体模型基于标准的Transformer结构,我们采用了和LLaMA一样的模型设计
|
72 |
- **Position Embedding**:采用rotary-embedding,是现阶段被大多数模型采用的位置编码方案,具有很好的外推性。
|
73 |
- **Feedforward Layer**:采用SwiGLU,Feedforward变化为(8/3)倍的隐含层大小,即11008。
|
74 |
- **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。
|
|
|
83 |
| vocab size | 64000 |
|
84 |
| sequence length | 4096 |
|
85 |
|
86 |
+
The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:
|
87 |
|
88 |
- Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
|
89 |
- Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
|
|
|
183 |
|
184 |
| Model | Average |
|
185 |
|-------------------------|-----------------|
|
186 |
+
| Open-LLaMA-v2-pretrain | 23.49 |
|
187 |
| Ziya-LLaMA-13B-pretrain | 27.64 |
|
188 |
| Falcon-7B | 27.18 |
|
189 |
| TigerBot-7B-base | 25.19 |
|