GradientGuru commited on
Commit
fa322ac
·
1 Parent(s): aab4fd3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -21,11 +21,11 @@ If you wish to use baichuan-7B (for inference, finetuning, etc.), we recommend u
21
 
22
  - 在同尺寸模型中baichuan-7B达到了目前SOTA的水平,参考下面MMLU指标
23
  - baichuan-7B使用自有的中英文双语语料进行训练,在中文上进行优化,在C-Eval达到SOTA水平
24
- - 不同于Llama完全禁止商业使用,baichuan-7B使用更宽松的开源协议,允许用于商业目的
25
 
26
  - Among models of the same size, baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
27
  - baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
28
- - Unlike Llama, which completely prohibits commercial use, baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.
29
 
30
  ## How to Get Started with the Model
31
 
@@ -68,7 +68,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
68
 
69
  <!-- Provide the basic links for the model. -->
70
 
71
- 整体模型基于标准的Transformer结构,我们采用了和Llama一样的模型设计
72
  - **Position Embedding**:采用rotary-embedding,是现阶段被大多数模型采用的位置编码方案,具有很好的外推性。
73
  - **Feedforward Layer**:采用SwiGLU,Feedforward变化为(8/3)倍的隐含层大小,即11008。
74
  - **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。
@@ -83,7 +83,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
83
  | vocab size | 64000 |
84
  | sequence length | 4096 |
85
 
86
- The overall model is based on the standard Transformer structure, and we have adopted the same model design as Llama:
87
 
88
  - Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
89
  - Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
@@ -183,7 +183,7 @@ For specific training settings, please refer to [baichuan-7B](https://github.com
183
 
184
  | Model | Average |
185
  |-------------------------|-----------------|
186
- | Open-Llama-v2-pretrain | 23.49 |
187
  | Ziya-LLaMA-13B-pretrain | 27.64 |
188
  | Falcon-7B | 27.18 |
189
  | TigerBot-7B-base | 25.19 |
 
21
 
22
  - 在同尺寸模型中baichuan-7B达到了目前SOTA的水平,参考下面MMLU指标
23
  - baichuan-7B使用自有的中英文双语语料进行训练,在中文上进行优化,在C-Eval达到SOTA水平
24
+ - 不同于LLaMA完全禁止商业使用,baichuan-7B使用更宽松的开源协议,允许用于商业目的
25
 
26
  - Among models of the same size, baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
27
  - baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
28
+ - Unlike LLaMA, which completely prohibits commercial use, baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.
29
 
30
  ## How to Get Started with the Model
31
 
 
68
 
69
  <!-- Provide the basic links for the model. -->
70
 
71
+ 整体模型基于标准的Transformer结构,我们采用了和LLaMA一样的模型设计
72
  - **Position Embedding**:采用rotary-embedding,是现阶段被大多数模型采用的位置编码方案,具有很好的外推性。
73
  - **Feedforward Layer**:采用SwiGLU,Feedforward变化为(8/3)倍的隐含层大小,即11008。
74
  - **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。
 
83
  | vocab size | 64000 |
84
  | sequence length | 4096 |
85
 
86
+ The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:
87
 
88
  - Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
89
  - Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
 
183
 
184
  | Model | Average |
185
  |-------------------------|-----------------|
186
+ | Open-LLaMA-v2-pretrain | 23.49 |
187
  | Ziya-LLaMA-13B-pretrain | 27.64 |
188
  | Falcon-7B | 27.18 |
189
  | TigerBot-7B-base | 25.19 |