tainc commited on
Commit
b99fd0c
·
verified ·
1 Parent(s): 2f83a17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -33,7 +33,7 @@ The continued pre-training data for Llama3.1 8B CPT SEA-LIONv3 base model encomp
33
  - **Funded by:** Singapore NRF
34
  - **Model type:** Decoder
35
  - **Languages:** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
36
- - **License:** [Gemma Community License](https://ai.google.dev/gemma/terms)
37
 
38
  For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
39
 
@@ -93,7 +93,7 @@ Llama3.1 8B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens o
93
 
94
 
95
  Note:
96
- - All token counts are counted using Gemma2 tokenizer
97
  - Wiki* sources includes Wikipedia, Wiki Books, Wiki Source, Wiki Voyage and Fandom Wiki
98
  - News* sources includes VOA, Global Voices, MediaCorp, VinBigData-News
99
  - Tamil news is sourced with permission from [Seithi](https://seithi.mediacorp.sg/)
 
33
  - **Funded by:** Singapore NRF
34
  - **Model type:** Decoder
35
  - **Languages:** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
36
+ - **License:** [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
37
 
38
  For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
39
 
 
93
 
94
 
95
  Note:
96
+ - All token counts are counted using Llama 3.1 tokenizer
97
  - Wiki* sources includes Wikipedia, Wiki Books, Wiki Source, Wiki Voyage and Fandom Wiki
98
  - News* sources includes VOA, Global Voices, MediaCorp, VinBigData-News
99
  - Tamil news is sourced with permission from [Seithi](https://seithi.mediacorp.sg/)