RicardoLee commited on
Commit
b64445e
·
1 Parent(s): 5808953

README rectify

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -23,6 +23,8 @@ The training data is sampled from [BELLE](https://huggingface.co/BelleGroup) pro
23
 
24
  ## Train Detail
25
 
 
 
26
  1. 训练框架:该模型使用了修改过的[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)项目进行训练。
27
  2. Tokenizer:该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
28
  3. 训练参数:由于模型需要resize embedding,多出来的embedding等于随即初始化,因此训练前期deepspeed及其容易因“OVERFLOW”而开始reduce loss scale。频繁reduce 后会直接导致scale过小溢出,从而导致训练崩溃。此时不应降低学习率,warmup 等超参,而是应该放大到Pretrain 规模。如此才能让随即初始化的embedding快速走上正轨。
@@ -30,6 +32,7 @@ The training data is sampled from [BELLE](https://huggingface.co/BelleGroup) pro
30
  5. 训练起始的loss:8.7072
31
  6. 训练终止的loss:1.5674
32
 
 
33
 
34
  1. Trianing Framework: This model is trained on modified [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) Framework.
35
  2. Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.
 
23
 
24
  ## Train Detail
25
 
26
+ 一些训练上的细节:
27
+
28
  1. 训练框架:该模型使用了修改过的[Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca)项目进行训练。
29
  2. Tokenizer:该模型使用了Chinese-Alpaca-Plus模型的tokenizer.model。这是因为LLama2本身的tokenizer.model同LLama1是一摸一样的。因此理论上可以完全复用Chinese-LLaMa项目的tokenizer而不会产生如何错位问题。
30
  3. 训练参数:由于模型需要resize embedding,多出来的embedding等于随即初始化,因此训练前期deepspeed及其容易因“OVERFLOW”而开始reduce loss scale。频繁reduce 后会直接导致scale过小溢出,从而导致训练崩溃。此时不应降低学习率,warmup 等超参,而是应该放大到Pretrain 规模。如此才能让随即初始化的embedding快速走上正轨。
 
32
  5. 训练起始的loss:8.7072
33
  6. 训练终止的loss:1.5674
34
 
35
+ Some details in training:
36
 
37
  1. Trianing Framework: This model is trained on modified [Chinese-LLaMA-Alpaca](https://github.com/ymcui/Chinese-LLaMA-Alpaca) Framework.
38
  2. Tokenizer: This model utilizes the tokenizer.model from the Chinese-Alpaca-Plus model. The reason for this choice is that the tokenizer.model in LLama2 is identical to the one used in LLama1. As a result, it is theoretically feasible to entirely reuse the tokenizer from the Chinese-LLaMa project without encountering any issues related to token misalignment.