aisingapore
/

llama3.1-8b-cpt-sea-lionv3-base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tainc commited on Dec 18, 2024

Commit

e6014de

·

verified ·

1 Parent(s): 36cd657

Update README.md

Files changed (1) hide show

README.md +7 -9

README.md CHANGED Viewed

@@ -33,7 +33,7 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
 ## Model Details
 ### Model Description
-The continued pre-training data for Llama3.1 8B CPT SEA-LIONv3 Base encompasses approximately 200B tokens.
 For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
@@ -52,14 +52,13 @@ For more details on Llama3.1 8B CPT SEA-LIONv3 base benchmark performance, pleas
 ## Technical Specifications
 ### Infrastructure
-Llama3.1 8B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
-on the following hardware:
-| Training Details     | Llama3.1 8B CPT SEA-LIONv3 |
-|----------------------|:------------------------:|
-| SingTel HGX-100      |        8+1 instances     |
-| Nvidia H100 80GB GPU |        64+8              |
-| Training Duration    |        10 days           |
 ### Configuration
 | HyperParameter    | Llama3.1 8B CPT SEA-LIONv3 |
@@ -69,7 +68,6 @@ on the following hardware:
 | Scheduler         | weight_stable_decay      |
 | Learning Rate     | 1.0e-5                   |
 | Global Batch Size | 512                      |
-| Micro Batch Size  | 1                        |
 ## Data
 Llama3.1 8B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data:

 ## Model Details
 ### Model Description
+The continued pre-training data for Llama3.1 8B CPT SEA-LIONv3 Base encompasses approximately 200B tokens and includes the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
 For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
 ## Technical Specifications
 ### Infrastructure
+Llama3.1 8B CPT SEA-LIONv3 was trained using [MosaicML Composer](https://github.com/mosaicml/composer) on the following hardware:
+| Training Details      | Llama3.1 8B CPT SEA-LIONv3 |
+|-----------------------|:--------------------------:|
+| AWS p5e.48xlarge      |        8 instances         |
+| Nvidia H200 140GB GPU |        64                  |
+| Training Duration     |        136 Hours           |
 ### Configuration
 | HyperParameter    | Llama3.1 8B CPT SEA-LIONv3 |
 | Scheduler         | weight_stable_decay      |
 | Learning Rate     | 1.0e-5                   |
 | Global Batch Size | 512                      |
 ## Data
 Llama3.1 8B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens of the following data: