baichuan-inc
/

Baichuan-7B

@@ -21,11 +21,11 @@ If you wish to use baichuan-7B (for inference, finetuning, etc.), we recommend u
 - 在同尺寸模型中baichuan-7B达到了目前SOTA的水平，参考下面MMLU指标
 - baichuan-7B使用自有的中英文双语语料进行训练，在中文上进行优化，在C-Eval达到SOTA水平
-- 不同于Llama完全禁止商业使用，baichuan-7B使用更宽松的开源协议，允许用于商业目的
 - Among models of the same size, baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
 - baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
-- Unlike Llama, which completely prohibits commercial use, baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.
 ## How to Get Started with the Model
@@ -68,7 +68,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 <!-- Provide the basic links for the model. -->
-整体模型基于标准的Transformer结构，我们采用了和Llama一样的模型设计
 - **Position Embedding**：采用rotary-embedding，是现阶段被大多数模型采用的位置编码方案，具有很好的外推性。
 - **Feedforward Layer**：采用SwiGLU，Feedforward变化为(8/3)倍的隐含层大小，即11008。
 - **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。
@@ -83,7 +83,7 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
 | vocab size | 64000 |
 | sequence length | 4096 |
-The overall model is based on the standard Transformer structure, and we have adopted the same model design as Llama:
 - Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
 - Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
@@ -183,7 +183,7 @@ For specific training settings, please refer to [baichuan-7B](https://github.com
 | Model           | Average |
 |-------------------------|-----------------|
-| Open-Llama-v2-pretrain  | 23.49           |
 | Ziya-LLaMA-13B-pretrain | 27.64           |
 | Falcon-7B               | 27.18           |
 | TigerBot-7B-base        | 25.19           |

 - 在同尺寸模型中baichuan-7B达到了目前SOTA的水平，参考下面MMLU指标
 - baichuan-7B使用自有的中英文双语语料进行训练，在中文上进行优化，在C-Eval达到SOTA水平
+- 不同于LLaMA完全禁止商业使用，baichuan-7B使用更宽松的开源协议，允许用于商业目的
 - Among models of the same size, baichuan-7B has achieved the current state-of-the-art (SOTA) level, as evidenced by the following MMLU metrics.
 - baichuan-7B is trained on proprietary bilingual Chinese-English corpora, optimized for Chinese, and achieves SOTA performance on C-Eval.
+- Unlike LLaMA, which completely prohibits commercial use, baichuan-7B employs a more lenient open-source license, allowing for commercial purposes.
 ## How to Get Started with the Model
 <!-- Provide the basic links for the model. -->
+整体模型基于标准的Transformer结构，我们采用了和LLaMA一样的模型设计
 - **Position Embedding**：采用rotary-embedding，是现阶段被大多数模型采用的位置编码方案，具有很好的外推性。
 - **Feedforward Layer**：采用SwiGLU，Feedforward变化为(8/3)倍的隐含层大小，即11008。
 - **Layer Normalization**: 基于[RMSNorm](https://arxiv.org/abs/1910.07467)的Pre-Normalization。
 | vocab size | 64000 |
 | sequence length | 4096 |
+The overall model is based on the standard Transformer structure, and we have adopted the same model design as LLaMA:
 - Position Embedding: We use rotary-embedding, which is the position encoding scheme adopted by most models at this stage, and it has excellent extrapolation capabilities.
 - Feedforward Layer: We use SwiGLU. The feedforward changes to (8/3) times the size of the hidden layer, that is, 11008.
 | Model           | Average |
 |-------------------------|-----------------|
+| Open-LLaMA-v2-pretrain  | 23.49           |
 | Ziya-LLaMA-13B-pretrain | 27.64           |
 | Falcon-7B               | 27.18           |
 | TigerBot-7B-base        | 25.19           |