chaoscodes commited on
Commit
5357b80
1 Parent(s): 59459bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -5,7 +5,10 @@ datasets:
5
  language:
6
  - en
7
  ---
8
- # TinyLlama-1.1B-v1.1 Math&Code
 
 
 
9
 
10
  https://github.com/jzhang38/TinyLlama
11
 
@@ -32,7 +35,7 @@ In this initial phase, we managed to train our model with only slimpajama to dev
32
 
33
  #### Continual pretraining with specific domain
34
 
35
- We incorporated 3 different kinds of corpus during this pretraining, slimpajama (which is the same as the first phase), Code&Math (starcoder and proof pile), and Chinese (Skypile). This approach allowed us to develop three variant models with specialized capabilities.
36
 
37
  At the begining ~6B tokens in this stage, we linearly increased the sampling proportion for the domain-specific corpus (excluding Slimpajama, as it remained unchanged compared with stage 1). This warmup sampling increasing strategy was designed to gradually adjust the distribution of the pretraining data, ensuring a more stable training process. After this sampling increasing stage, we continued pretraining the model with stable sampling strategy until reaching ~1.85T tokens.
38
 
@@ -45,8 +48,8 @@ Implementing a cooldown phase has become a crucial technique to achieve better m
45
  Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
46
 
47
  1. **TinyLlama_v1.1**: The standard version, used for general purposes.
48
- 2. **TinyLlama_v1.1_math_code**: Equipped with better ability for math and code.
49
- 3. **TinyLlama_v1.1_chinese**: Good understanding capacity for Chinese.
50
 
51
  ## Data
52
 
 
5
  language:
6
  - en
7
  ---
8
+ <div align="center">
9
+ # TinyLlama-1.1B-v1.1
10
+
11
+ </div>
12
 
13
  https://github.com/jzhang38/TinyLlama
14
 
 
35
 
36
  #### Continual pretraining with specific domain
37
 
38
+ We incorporated 3 different kinds of corpus during this pretraining, slimpajama (which is the same as the first phase), Math&Code (starcoder and proof pile), and Chinese (Skypile). This approach allowed us to develop three variant models with specialized capabilities.
39
 
40
  At the begining ~6B tokens in this stage, we linearly increased the sampling proportion for the domain-specific corpus (excluding Slimpajama, as it remained unchanged compared with stage 1). This warmup sampling increasing strategy was designed to gradually adjust the distribution of the pretraining data, ensuring a more stable training process. After this sampling increasing stage, we continued pretraining the model with stable sampling strategy until reaching ~1.85T tokens.
41
 
 
48
  Following an extensive and detailed pretraining process. We are now releasing three specialized versions of our model:
49
 
50
  1. **TinyLlama_v1.1**: The standard version, used for general purposes.
51
+ 2. **TinyLlama_v1.1_Math&Code**: Equipped with better ability for math and code.
52
+ 3. **TinyLlama_v1.1_Chinese**: Good understanding capacity for Chinese.
53
 
54
  ## Data
55