tainc commited on
Commit
a78b005
·
verified ·
1 Parent(s): 5e2ad22

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -21,7 +21,7 @@ base_model: meta-llama/Llama-3.1-8B-Instruct
21
  # Llama3.1 8B CPT SEA-LIONv3
22
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
23
 
24
- Llama3.1 8B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training from [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on English and Southeast Asian text.
25
 
26
  SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
27
 
@@ -33,7 +33,7 @@ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
33
 
34
  ## Model Details
35
  ### Model Description
36
- The continued pre-training data for Llama3.1 8B CPT SEA-LIONv3 Base encompasses approximately 200B tokens across the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
37
 
38
  For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
39
 
 
21
  # Llama3.1 8B CPT SEA-LIONv3
22
  SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
23
 
24
+ Llama3.1 8B CPT SEA-LIONv3 Base is a multilingual model which has undergone continued pre-training on approximately **200B** tokens across the 11 official Southeast Asian languages: English, Chinese, Vietnamese, Indonesian, Thai, Tamil, Filipino, Malay, Khmer, Lao, Burmese.
25
 
26
  SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
27
 
 
33
 
34
  ## Model Details
35
  ### Model Description
36
+ We performed continued pre-training in English and ASEAN languages on [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct), a decoder model using the Llama 3.1 architecture, to create Llama3.1 8B CPT SEA-LIONv3 Base.
37
 
38
  For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
39