aisingapore
/

sea-lion-3b

@@ -28,8 +28,8 @@ The training data for SEA LION is encompasses 1 trillion tokens.
 - **Funded by [optional]:** Singapore NRF
 - **Shared by [optional]:** N/A
 - **Model type:** Decoder
-- **Language(s) (NLP):** English, Chinese, Indonesian, Malay, Thai, Vietnamese, Filipino/Tagalog, Tamil, Burnese, Khmer, Lao
-- **License:** Apache 2.0
 - **Finetuned from model [optional]:** N/A
 ### Model Sources [optional]
@@ -86,7 +86,7 @@ Use the code below to get started with the model.
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese, Indonesian, Malay, Filipino/Tagalog, Burmese, Vietnamese, Thai, Lao, Khmer, Tamil).
 | Data Source            | Tokens | Percentage |
 |------------------------|--------|------------|
@@ -94,7 +94,7 @@ SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese,
 | mC4 - Chinese          |  91.2B |     10.03% |
 | mC4 - Indonesian       |   3.6B |      0.40% |
 | mC4 - Malay            |   0.7B |      0.08% |
-| mC4 - Filipino/Tagalog |   1.3B |      0.15% |
 | mC4 - Burmese          |   1.2B |      0.13% |
 | mC4 - Vietnamese       |  63.4B |      6.97% |
 | mC4 - Thai             |  10.8B |      1.19% |
@@ -113,7 +113,9 @@ SEA LION 3B was trained on 980B tokens of RefinedWeb (English) and mC4 (Chinese,
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-SEA LION 3B was trained on 256 A100 40GB GPUs, using MosaicML Composer.
 #### Preprocessing [optional]
@@ -121,14 +123,14 @@ N/A
 #### Training Hyperparameters
-| Hyperparameter    | Value             |
-|-------------------|-------------------|
-| Precision         | bfloat16          |
-| Optimizer         | decoupled_adamw   |
-| Scheduler         | cosin_with_warmup |
-| Learning Rate     | 1.6e-4            |
-| Global Batch Size | 1200              |
-| Micro Batch Size  | 5                 |
 #### Speeds, Sizes, Times [optional]
@@ -159,6 +161,7 @@ _Coming soon_
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 _Coming soon_
 ### Results
@@ -204,7 +207,10 @@ SEA LION 3B is a decoder model using the MPT architecture.
 #### Hardware
-SEA LION 3B was trained on AWS EC2 cluster comprising 32 p4d.24xlarge instances, using a total of 256 A100 40GB GPUs.
 #### Software
@@ -234,6 +240,8 @@ N/A
 ## The Team
 Hamsawardhini Rengarajan<br>
 Holy Lovenia<br>
 Lam Clarence<br>
@@ -247,12 +255,10 @@ Tan Choon Meng<br>
 Thanh Ngan Nguyen<br>
 Teo Jin Howe<br>
 Teo Wei Yi<br>
 Yeo Yeow Tong<br>
 Yong Xianbin<br>
 Yosephine<br>
-William Tjhi<br>
-David Ong Tat-Wee<br>
-Darius Liu<br>
 Leslie Teo<br>
 ## Model Card Contact

 - **Funded by [optional]:** Singapore NRF
 - **Shared by [optional]:** N/A
 - **Model type:** Decoder
+- **Language(s) (NLP):** English, Chinese, Indonesian, Malay, Thai, Vietnamese, Filipino, Tamil, Burmese, Khmer, Lao
+- **License:** MIT License
 - **Finetuned from model [optional]:** N/A
 ### Model Sources [optional]
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+SEA LION 3B was trained on 980B tokens of the following data:
 | Data Source            | Tokens | Percentage |
 |------------------------|--------|------------|
 | mC4 - Chinese          |  91.2B |     10.03% |
 | mC4 - Indonesian       |   3.6B |      0.40% |
 | mC4 - Malay            |   0.7B |      0.08% |
+| mC4 - Filipino         |   1.3B |      0.15% |
 | mC4 - Burmese          |   1.2B |      0.13% |
 | mC4 - Vietnamese       |  63.4B |      6.97% |
 | mC4 - Thai             |  10.8B |      1.19% |
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+SEA LION 3B was trained on 240 A100 40GB GPUs, using MosaicML Composer.
+SEA LION 7B was trained on 256 A100 40GB GPUs, using MosaicML Composer.
 #### Preprocessing [optional]
 #### Training Hyperparameters
+| Hyperparameter    | Value              |
+|-------------------|--------------------|
+| Precision         | bfloat16           |
+| Optimizer         | decoupled_adamw    |
+| Scheduler         | cosine_with_warmup |
+| Learning Rate     | 1.6e-4             |
+| Global Batch Size | 1200               |
+| Micro Batch Size  | 5                  |
 #### Speeds, Sizes, Times [optional]
 <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 _Coming soon_
+LLM Eval Benchmarks, no BHASA
 ### Results
 #### Hardware
+SEA LION 3B was trained on AWS EC2 cluster comprising 30 p4d.24xlarge instances, using a total of 240 A100 40GB GPUs.
+SEA LION 7B was trained on AWS EC2 cluster comprising 32 p4d.24xlarge instances, using a total of 256 A100 40GB GPUs.
 #### Software
 ## The Team
+Darius Liu<br>
+David Ong Tat-Wee<br>
 Hamsawardhini Rengarajan<br>
 Holy Lovenia<br>
 Lam Clarence<br>
 Thanh Ngan Nguyen<br>
 Teo Jin Howe<br>
 Teo Wei Yi<br>
+William Tjhi<br>
 Yeo Yeow Tong<br>
 Yong Xianbin<br>
 Yosephine<br>
 Leslie Teo<br>
 ## Model Card Contact