chaoscodes
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -4,10 +4,14 @@ datasets:
|
|
4 |
- cerebras/SlimPajama-627B
|
5 |
language:
|
6 |
- en
|
|
|
7 |
---
|
|
|
8 |
<div align="center">
|
9 |
|
|
|
10 |
# TinyLlama-1.1B-v1.1
|
|
|
11 |
</div>
|
12 |
|
13 |
https://github.com/jzhang38/TinyLlama
|
@@ -17,9 +21,17 @@ https://github.com/jzhang38/TinyLlama
|
|
17 |
<img src="https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-240k-503b/resolve/main/TinyLlama_logo.png" width="300"/>
|
18 |
</div>
|
19 |
|
|
|
20 |
We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
### Pretraining
|
|
|
23 |
Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-TinyLlama-1-5T-Checkpoints-Postponed-01b266998c1c47f78f5ae1520196d194?pvs=4), [bug2](https://whimsical-aphid-86d.notion.site/2023-12-18-Updates-from-TinyLlama-Team-7d30c01fff794da28ccc952f327c8d4f)). We try to retrain our TinyLlama to provide a better model. We train our model with 2T tokens and divided our pretraining into 3 different stages: 1) basic pretraining, 2) continual pretraining with specific domain, and 3) cooldown .
|
24 |
|
25 |
|
@@ -49,8 +61,10 @@ Following an extensive and detailed pretraining process. We are now releasing th
|
|
49 |
|
50 |
|
51 |
### How to use
|
|
|
52 |
You will need the transformers>=4.31
|
53 |
Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
|
|
|
54 |
```
|
55 |
from transformers import AutoTokenizer
|
56 |
import transformers
|
@@ -78,8 +92,9 @@ for seq in sequences:
|
|
78 |
```
|
79 |
|
80 |
### Eval
|
|
|
81 |
| Model | Pretrain Tokens | HellaSwag | Obqa | WinoGrande | ARC_c | ARC_e | boolq | piqa | avg |
|
82 |
| ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
|
83 |
| Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
|
84 |
| TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
|
85 |
-
| TinyLlama-1.1B-v1.1
|
|
|
4 |
- cerebras/SlimPajama-627B
|
5 |
language:
|
6 |
- en
|
7 |
+
|
8 |
---
|
9 |
+
|
10 |
<div align="center">
|
11 |
|
12 |
+
|
13 |
# TinyLlama-1.1B-v1.1
|
14 |
+
|
15 |
</div>
|
16 |
|
17 |
https://github.com/jzhang38/TinyLlama
|
|
|
21 |
<img src="https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-240k-503b/resolve/main/TinyLlama_logo.png" width="300"/>
|
22 |
</div>
|
23 |
|
24 |
+
|
25 |
We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
|
26 |
|
27 |
+
### Overview
|
28 |
+
|
29 |
+
In this project, rather than only training a single TinyLlama model, we first train TinyLlama on a corpus of 1.5 trillion tokens to obtain foundational language capabilities. Subsequently, we take this model and turn it into three different models by continual pre-training with three distinct data sampling. For a visual representation of this process, please refer to the figure below.
|
30 |
+
|
31 |
+
![image-20240401225128124](/Users/zengguangtao/Library/Application Support/typora-user-images/image-20240401225128124.png)
|
32 |
+
|
33 |
### Pretraining
|
34 |
+
|
35 |
Due to these issues([bug1](https://whimsical-aphid-86d.notion.site/Release-of-TinyLlama-1-5T-Checkpoints-Postponed-01b266998c1c47f78f5ae1520196d194?pvs=4), [bug2](https://whimsical-aphid-86d.notion.site/2023-12-18-Updates-from-TinyLlama-Team-7d30c01fff794da28ccc952f327c8d4f)). We try to retrain our TinyLlama to provide a better model. We train our model with 2T tokens and divided our pretraining into 3 different stages: 1) basic pretraining, 2) continual pretraining with specific domain, and 3) cooldown .
|
36 |
|
37 |
|
|
|
61 |
|
62 |
|
63 |
### How to use
|
64 |
+
|
65 |
You will need the transformers>=4.31
|
66 |
Do check the [TinyLlama](https://github.com/jzhang38/TinyLlama) GitHub page for more information.
|
67 |
+
|
68 |
```
|
69 |
from transformers import AutoTokenizer
|
70 |
import transformers
|
|
|
92 |
```
|
93 |
|
94 |
### Eval
|
95 |
+
|
96 |
| Model | Pretrain Tokens | HellaSwag | Obqa | WinoGrande | ARC_c | ARC_e | boolq | piqa | avg |
|
97 |
| ----------------------------------------- | --------------- | --------- | --------- | ---------- | --------- | --------- | ----- | --------- | --------- |
|
98 |
| Pythia-1.0B | 300B | 47.16 | 31.40 | 53.43 | 27.05 | 48.99 | 60.83 | 69.21 | 48.30 |
|
99 |
| TinyLlama-1.1B-intermediate-step-1431k-3T | 3T | 59.20 | 36.00 | 59.12 | 30.12 | 55.25 | 57.83 | 73.29 | 52.99 |
|
100 |
+
| TinyLlama-1.1B-v1.1 | 2T | **61.47** | **36.80** | **59.43** | **32.68** | **55.47** | 55.99 | **73.56** | **53.63** |
|