chaoscodes commited on
Commit
18126bc
·
verified ·
1 Parent(s): 5334c8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -4,10 +4,14 @@ datasets:
4
  - cerebras/SlimPajama-627B
5
  language:
6
  - en
 
7
  ---
 
8
  <div align="center">
9
 
 
10
  # TinyLlama-1.1B-v1.1
 
11
  </div>
12
 
13
  https://github.com/jzhang38/TinyLlama
@@ -17,9 +21,17 @@ https://github.com/jzhang38/TinyLlama
17
  <img src="https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-240k-503b/resolve/main/TinyLlama_logo.png" width="300"/>
18
  </div>
19
 
 
20
  We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
21
 
 
 
 
 
 
 
22
  ### Pretraining
 
23
  Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-TinyLlama-1-5T-Checkpoints-Postponed-01b266998c1c47f78f5ae1520196d194?pvs=4), [bug2](https://whimsical-aphid-86d.notion.site/2023-12-18-Updates-from-TinyLlama-Team-7d30c01fff794da28ccc952f327c8d4f)). We try to retrain our TinyLlama to provide a better model. We train our model with 2T tokens and divided our pretraining into 3 different stages: 1) basic pretraining, 2) continual pretraining with specific domain, and 3) cooldown .
24
 
25
 
@@ -49,8 +61,10 @@ Following an extensive and detailed pretraining process. We are now releasing th
49
 
50
 
51
  ### How to use
 
52
  You will need the transformers>=4.31
53
  Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
 
54
  ```
55
  from transformers import AutoTokenizer
56
  import transformers
@@ -78,8 +92,9 @@ for seq in sequences:
78
  ```
79
 
80
  ### Eval
 
81
  | Model | Pretrain Tokens | HellaSwag | Obqa | WinoGrande | ARC_c | ARC_e | boolq | piqa | avg |
82
  | ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
83
  | Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
84
  | TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
85
- | TinyLlama-1.1B-v1.1 | 2T | **61.47** | **36.80** | **59.43** | **32.68** | **55.47** | 55.99 | **73.56** | **53.63** |
 
4
  - cerebras/SlimPajama-627B
5
  language:
6
  - en
7
+
8
  ---
9
+
10
  <div align="center">
11
 
12
+
13
  # TinyLlama-1.1B-v1.1
14
+
15
  </div>
16
 
17
  https://github.com/jzhang38/TinyLlama
 
21
  <img src="https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-240k-503b/resolve/main/TinyLlama_logo.png" width="300"/>
22
  </div>
23
 
24
+
25
  We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
26
 
27
+ ### Overview
28
+
29
+ In this project, rather than only training a single TinyLlama model, we first train TinyLlama on a corpus of 1.5 trillion tokens to obtain foundational language capabilities. Subsequently, we take this model and turn it into three different models by continual pre-training with three distinct data sampling. For a visual representation of this process, please refer to the figure below.
30
+
31
+ ![image-20240401225128124](/Users/zengguangtao/Library/Application Support/typora-user-images/image-20240401225128124.png)
32
+
33
  ### Pretraining
34
+
35
  Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-TinyLlama-1-5T-Checkpoints-Postponed-01b266998c1c47f78f5ae1520196d194?pvs=4), [bug2](https://whimsical-aphid-86d.notion.site/2023-12-18-Updates-from-TinyLlama-Team-7d30c01fff794da28ccc952f327c8d4f)). We try to retrain our TinyLlama to provide a better model. We train our model with 2T tokens and divided our pretraining into 3 different stages: 1) basic pretraining, 2) continual pretraining with specific domain, and 3) cooldown .
36
 
37
 
 
61
 
62
 
63
  ### How to use
64
+
65
  You will need the transformers>=4.31
66
  Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
67
+
68
  ```
69
  from transformers import AutoTokenizer
70
  import transformers
 
92
  ```
93
 
94
  ### Eval
95
+
96
  | Model | Pretrain Tokens | HellaSwag | Obqa | WinoGrande | ARC_c | ARC_e | boolq | piqa | avg |
97
  | ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
98
  | Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
99
  | TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
100
+ | TinyLlama-1.1B-v1.1 | 2T | **61.47** | **36.80** | **59.43** | **32.68** | **55.47** | 55.99 | **73.56** | **53.63** |