aisingapore
/

llama3.1-70b-cpt-sea-lionv3-base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Jian-Gang commited on Dec 19, 2024

Commit

7839fa8

·

verified ·

1 Parent(s): 56ad8c1

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -37,9 +37,9 @@ We performed continued pre-training in English and SEA languages on [Llama-3.1-7
 For tokenisation, the model employs the default tokenizer used in Llama 3.1 70B Instruct.
 ### Benchmark Performance
-We evaluated Llama3.1 70B CPT SEA-LIONv3 base model on general language capabilities.
-#### General Language Capabilities
 For the evaluation of general language capabilities, we employed the [SEA-HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
 These tasks include Question Answering (QA), Sentiment Analysis (Sentiment), Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng), Abstractive Summarisation (Abssum), Causal Reasoning (Causal) and Natural Language Inference (NLI).
@@ -51,6 +51,8 @@ Following the implementation of IFEval in OpenLLM leaderboard, we also implement
 **SEA-IFEval**
 SEA-IFEval evaluates a model's ability to adhere to constraints provided in the prompt, for example beginning a response with a specific word/phrase or answering with a certain number of sections. Additionally, accuracy is normalised by the proportion of responses in the correct language (if the model performs the task correctly but responds in the wrong language, it is judged to have failed the task).
 For more details on Llama3.1 70B CPT SEA-LIONv3 base benchmark performance, please refer to the SEA-HELM leaderboard, https://leaderboard.sea-lion.ai/.

 For tokenisation, the model employs the default tokenizer used in Llama 3.1 70B Instruct.
 ### Benchmark Performance
+We evaluated Llama3.1 70B CPT SEA-LIONv3 base model on general language capabilities and constraint-following behaviour.
+#### General Language Capabilities and Constraint-following Behaviour
 For the evaluation of general language capabilities, we employed the [SEA-HELM (also known as BHASA) evaluation benchmark](https://arxiv.org/abs/2309.06085v2) across a variety of tasks.
 These tasks include Question Answering (QA), Sentiment Analysis (Sentiment), Toxicity Detection (Toxicity), Translation in both directions (Eng>Lang & Lang>Eng), Abstractive Summarisation (Abssum), Causal Reasoning (Causal) and Natural Language Inference (NLI).
 **SEA-IFEval**
+Based on [IFEval](https://arxiv.org/abs/2311.07911), the linguists and native speakers in the team worked together to filter, localise and translate the datasets into the respective target languages to ensure that the examples remained reasonable, meaningful and natural.
 SEA-IFEval evaluates a model's ability to adhere to constraints provided in the prompt, for example beginning a response with a specific word/phrase or answering with a certain number of sections. Additionally, accuracy is normalised by the proportion of responses in the correct language (if the model performs the task correctly but responds in the wrong language, it is judged to have failed the task).
 For more details on Llama3.1 70B CPT SEA-LIONv3 base benchmark performance, please refer to the SEA-HELM leaderboard, https://leaderboard.sea-lion.ai/.