aisingapore
/

llama3.1-8b-cpt-sea-lionv3-base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tainc commited on Dec 11, 2024

Commit

b99fd0c

·

verified ·

1 Parent(s): 2f83a17

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -33,7 +33,7 @@ The continued pre-training data for Llama3.1 8B CPT SEA-LIONv3 base model encomp
 - **Funded by:** Singapore NRF
 - **Model type:** Decoder
 - **Languages:** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
-- **License:** [Gemma Community License](https://ai.google.dev/gemma/terms)
 For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
@@ -93,7 +93,7 @@ Llama3.1 8B CPT SEA-LIONv3 base model was continued pre-trained on 200B tokens o
 Note:
-- All token counts are counted using Gemma2 tokenizer
 - Wiki* sources includes Wikipedia, Wiki Books, Wiki Source, Wiki Voyage and Fandom Wiki
 - News* sources includes VOA, Global Voices, MediaCorp, VinBigData-News
 - Tamil news is sourced with permission from [Seithi](https://seithi.mediacorp.sg/)

 - **Funded by:** Singapore NRF
 - **Model type:** Decoder
 - **Languages:** English, Chinese, Vietnamese, Indonesian, Thai, Filipino, Tamil, Malay, Khmer, Lao, Burmese, Javanese, Sundanese
+- **License:** [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)
 For tokenisation, the model employs the default tokenizer used in Llama3.1 8B Instruct.
 Note:
+- All token counts are counted using Llama 3.1 tokenizer
 - Wiki* sources includes Wikipedia, Wiki Books, Wiki Source, Wiki Voyage and Fandom Wiki
 - News* sources includes VOA, Global Voices, MediaCorp, VinBigData-News
 - Tamil news is sourced with permission from [Seithi](https://seithi.mediacorp.sg/)