NucleusAI
/

nucleus-22B-token-500B

@@ -4,14 +4,14 @@ language:
 - en
 ---
-# 🚀 Nucleus-22B-token-350B
-**Nucleus-22B-token-350B is a 22B parameters causal decoder-only model built by Nucleus.AI and trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) along with curated corpora. It is made available under the Apache 2.0 license.**
 *1T-token model coming soon* 😊.
-## What about Nucleus-22B-token-350B?
 * **It performs well compared to similar-size open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 * **It is made available under an MIT license**.
@@ -19,7 +19,7 @@ language:
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
-# Model Card for Nucleus-22B-token-350B
 ## Model Details
@@ -46,11 +46,11 @@ Production use without adequate assessment of risks and mitigation; any use case
 ## Bias, Risks, and Limitations
-Nucleus-22B-token-350B is trained on English data only, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
 ### Recommendations
-We recommend users of Nucleus-22B-token-350B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
 ## How to Get Started with the Mode
@@ -59,7 +59,7 @@ We recommend users of Nucleus-22B-token-350B to consider finetuning it for the s
 ### Training Data
-Nucleus-22B-token-350B was trained on 350B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), along with other corpora.
 | **Data source**    | **Fraction** | **Tokens** | **Sources**                       |
 |--------------------|--------------|------------|-----------------------------------|
@@ -74,7 +74,7 @@ The data was tokenized with the tokenizer similar to Llama-[7B](https://huggingf
 ### Training Procedure
-Nucleus-22B-token-350B was trained on 256 A100 80GB GPUs, using a FSDP
 #### Training Hyperparameters

 - en
 ---
+# 🚀 Nucleus-22B-token-500B
+**Nucleus-22B-token-500B is a 22B parameters causal decoder-only model built by Nucleus.AI and trained on 500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) along with curated corpora. It is made available under the Apache 2.0 license.**
 *1T-token model coming soon* 😊.
+## What about Nucleus-22B-token-500B?
 * **It performs well compared to similar-size open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
 * **It is made available under an MIT license**.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
+# Model Card for Nucleus-22B-token-500B
 ## Model Details
 ## Bias, Risks, and Limitations
+Nucleus-22B-token-500B is trained on English data only, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
 ### Recommendations
+We recommend users of Nucleus-22B-token-500B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
 ## How to Get Started with the Mode
 ### Training Data
+Nucleus-22B-token-500B was trained on 500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), along with other corpora.
 | **Data source**    | **Fraction** | **Tokens** | **Sources**                       |
 |--------------------|--------------|------------|-----------------------------------|
 ### Training Procedure
+Nucleus-22B-token-500B was trained on 256 A100 80GB GPUs, using a FSDP
 #### Training Hyperparameters