dotw commited on
Commit
1e72cb7
·
1 Parent(s): c5067e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -17
README.md CHANGED
@@ -28,8 +28,8 @@ The training data for SEA LION is encompasses 1 trillion tokens.
28
  - **Funded by [optional]:** Singapore NRF
29
  - **Shared by [optional]:** N/A
30
  - **Model type:** Decoder
31
- - **Language(s) (NLP):** English, Chinese, Indonesian, Malay, Thai, Vietnamese, Filipino/Tagalog, Tamil, Burnese, Khmer, Lao
32
- - **License:** Apache 2.0
33
  - **Finetuned from model [optional]:** N/A
34
 
35
  ### Model Sources [optional]
@@ -86,7 +86,7 @@ Use the code below to get started with the model.
86
 
87
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
 
89
- SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese, Indonesian, Malay, Filipino/Tagalog, Burmese, Vietnamese, Thai, Lao, Khmer, Tamil).
90
 
91
  | Data Source | Tokens | Percentage |
92
  |------------------------|--------|------------|
@@ -94,7 +94,7 @@ SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese,
94
  | mC4 - Chinese | 91.2B | 10.03% |
95
  | mC4 - Indonesian | 3.6B | 0.40% |
96
  | mC4 - Malay | 0.7B | 0.08% |
97
- | mC4 - Filipino/Tagalog | 1.3B | 0.15% |
98
  | mC4 - Burmese | 1.2B | 0.13% |
99
  | mC4 - Vietnamese | 63.4B | 6.97% |
100
  | mC4 - Thai | 10.8B | 1.19% |
@@ -113,7 +113,9 @@ SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese,
113
 
114
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
115
 
116
- SEA LION 3B was trained on 256 A100 40GB GPUs, using MosaicML Composer.
 
 
117
 
118
  #### Preprocessing [optional]
119
 
@@ -121,14 +123,14 @@ N/A
121
 
122
  #### Training Hyperparameters
123
 
124
- | Hyperparameter | Value |
125
- |-------------------|-------------------|
126
- | Precision | bfloat16 |
127
- | Optimizer | decoupled_adamw |
128
- | Scheduler | cosin_with_warmup |
129
- | Learning Rate | 1.6e-4 |
130
- | Global Batch Size | 1200 |
131
- | Micro Batch Size | 5 |
132
 
133
  #### Speeds, Sizes, Times [optional]
134
 
@@ -159,6 +161,7 @@ _Coming soon_
159
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
160
 
161
  _Coming soon_
 
162
 
163
  ### Results
164
 
@@ -204,7 +207,10 @@ SEA LION 3B is a decoder model using the MPT architecture.
204
 
205
  #### Hardware
206
 
207
- SEA LION 3B was trained on AWS EC2 cluster comprising 32 p4d.24xlarge instances, using a total of 256 A100 40GB GPUs.
 
 
 
208
 
209
  #### Software
210
 
@@ -234,6 +240,8 @@ N/A
234
 
235
  ## The Team
236
 
 
 
237
  Hamsawardhini Rengarajan<br>
238
  Holy Lovenia<br>
239
  Lam Clarence<br>
@@ -247,12 +255,10 @@ Tan Choon Meng<br>
247
  Thanh Ngan Nguyen<br>
248
  Teo Jin Howe<br>
249
  Teo Wei Yi<br>
 
250
  Yeo Yeow Tong<br>
251
  Yong Xianbin<br>
252
  Yosephine<br>
253
- William Tjhi<br>
254
- David Ong Tat-Wee<br>
255
- Darius Liu<br>
256
  Leslie Teo<br>
257
 
258
  ## Model Card Contact
 
28
  - **Funded by [optional]:** Singapore NRF
29
  - **Shared by [optional]:** N/A
30
  - **Model type:** Decoder
31
+ - **Language(s) (NLP):** English, Chinese, Indonesian, Malay, Thai, Vietnamese, Filipino, Tamil, Burmese, Khmer, Lao
32
+ - **License:** MIT License
33
  - **Finetuned from model [optional]:** N/A
34
 
35
  ### Model Sources [optional]
 
86
 
87
  <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
88
 
89
+ SEA LION 3B was trained on 980B tokens of the following data:
90
 
91
  | Data Source | Tokens | Percentage |
92
  |------------------------|--------|------------|
 
94
  | mC4 - Chinese | 91.2B | 10.03% |
95
  | mC4 - Indonesian | 3.6B | 0.40% |
96
  | mC4 - Malay | 0.7B | 0.08% |
97
+ | mC4 - Filipino | 1.3B | 0.15% |
98
  | mC4 - Burmese | 1.2B | 0.13% |
99
  | mC4 - Vietnamese | 63.4B | 6.97% |
100
  | mC4 - Thai | 10.8B | 1.19% |
 
113
 
114
  <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
115
 
116
+ SEA LION 3B was trained on 240 A100 40GB GPUs, using MosaicML Composer.
117
+
118
+ SEA LION 7B was trained on 256 A100 40GB GPUs, using MosaicML Composer.
119
 
120
  #### Preprocessing [optional]
121
 
 
123
 
124
  #### Training Hyperparameters
125
 
126
+ | Hyperparameter | Value |
127
+ |-------------------|--------------------|
128
+ | Precision | bfloat16 |
129
+ | Optimizer | decoupled_adamw |
130
+ | Scheduler | cosine_with_warmup |
131
+ | Learning Rate | 1.6e-4 |
132
+ | Global Batch Size | 1200 |
133
+ | Micro Batch Size | 5 |
134
 
135
  #### Speeds, Sizes, Times [optional]
136
 
 
161
  <!-- These are the evaluation metrics being used, ideally with a description of why. -->
162
 
163
  _Coming soon_
164
+ LLM Eval Benchmarks, no BHASA
165
 
166
  ### Results
167
 
 
207
 
208
  #### Hardware
209
 
210
+ SEA LION 3B was trained on AWS EC2 cluster comprising 30 p4d.24xlarge instances, using a total of 240 A100 40GB GPUs.
211
+
212
+ SEA LION 7B was trained on AWS EC2 cluster comprising 32 p4d.24xlarge instances, using a total of 256 A100 40GB GPUs.
213
+
214
 
215
  #### Software
216
 
 
240
 
241
  ## The Team
242
 
243
+ Darius Liu<br>
244
+ David Ong Tat-Wee<br>
245
  Hamsawardhini Rengarajan<br>
246
  Holy Lovenia<br>
247
  Lam Clarence<br>
 
255
  Thanh Ngan Nguyen<br>
256
  Teo Jin Howe<br>
257
  Teo Wei Yi<br>
258
+ William Tjhi<br>
259
  Yeo Yeow Tong<br>
260
  Yong Xianbin<br>
261
  Yosephine<br>
 
 
 
262
  Leslie Teo<br>
263
 
264
  ## Model Card Contact