sippycoder
commited on
Commit
β’
883ff9f
1
Parent(s):
2e48d83
initial commit
Browse files
README.md
CHANGED
@@ -4,14 +4,14 @@ language:
|
|
4 |
- en
|
5 |
---
|
6 |
|
7 |
-
# π Nucleus-22B-token-
|
8 |
|
9 |
-
**Nucleus-22B-token-
|
10 |
|
11 |
*1T-token model coming soon* π.
|
12 |
|
13 |
|
14 |
-
## What about Nucleus-22B-token-
|
15 |
|
16 |
* **It performs well compared to similar-size open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
17 |
* **It is made available under an MIT license**.
|
@@ -19,7 +19,7 @@ language:
|
|
19 |
|
20 |
β οΈ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
|
21 |
|
22 |
-
# Model Card for Nucleus-22B-token-
|
23 |
|
24 |
## Model Details
|
25 |
|
@@ -46,11 +46,11 @@ Production use without adequate assessment of risks and mitigation; any use case
|
|
46 |
|
47 |
## Bias, Risks, and Limitations
|
48 |
|
49 |
-
Nucleus-22B-token-
|
50 |
|
51 |
### Recommendations
|
52 |
|
53 |
-
We recommend users of Nucleus-22B-token-
|
54 |
|
55 |
## How to Get Started with the Mode
|
56 |
|
@@ -59,7 +59,7 @@ We recommend users of Nucleus-22B-token-350B to consider finetuning it for the s
|
|
59 |
|
60 |
### Training Data
|
61 |
|
62 |
-
Nucleus-22B-token-
|
63 |
|
64 |
| **Data source** | **Fraction** | **Tokens** | **Sources** |
|
65 |
|--------------------|--------------|------------|-----------------------------------|
|
@@ -74,7 +74,7 @@ The data was tokenized with the tokenizer similar to Llama-[7B](https://huggingf
|
|
74 |
|
75 |
### Training Procedure
|
76 |
|
77 |
-
Nucleus-22B-token-
|
78 |
|
79 |
#### Training Hyperparameters
|
80 |
|
|
|
4 |
- en
|
5 |
---
|
6 |
|
7 |
+
# π Nucleus-22B-token-500B
|
8 |
|
9 |
+
**Nucleus-22B-token-500B is a 22B parameters causal decoder-only model built by Nucleus.AI and trained on 500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) along with curated corpora. It is made available under the Apache 2.0 license.**
|
10 |
|
11 |
*1T-token model coming soon* π.
|
12 |
|
13 |
|
14 |
+
## What about Nucleus-22B-token-500B?
|
15 |
|
16 |
* **It performs well compared to similar-size open-source models** (e.g., [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), [StableLM](https://github.com/Stability-AI/StableLM), [RedPajama](https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1) etc.), thanks to being trained on 1,500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) enhanced with curated corpora. See the [OpenLLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
|
17 |
* **It is made available under an MIT license**.
|
|
|
19 |
|
20 |
β οΈ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
|
21 |
|
22 |
+
# Model Card for Nucleus-22B-token-500B
|
23 |
|
24 |
## Model Details
|
25 |
|
|
|
46 |
|
47 |
## Bias, Risks, and Limitations
|
48 |
|
49 |
+
Nucleus-22B-token-500B is trained on English data only, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
|
50 |
|
51 |
### Recommendations
|
52 |
|
53 |
+
We recommend users of Nucleus-22B-token-500B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
|
54 |
|
55 |
## How to Get Started with the Mode
|
56 |
|
|
|
59 |
|
60 |
### Training Data
|
61 |
|
62 |
+
Nucleus-22B-token-500B was trained on 500B tokens of [RefinedWeb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb), along with other corpora.
|
63 |
|
64 |
| **Data source** | **Fraction** | **Tokens** | **Sources** |
|
65 |
|--------------------|--------------|------------|-----------------------------------|
|
|
|
74 |
|
75 |
### Training Procedure
|
76 |
|
77 |
+
Nucleus-22B-token-500B was trained on 256 A100 80GB GPUs, using a FSDP
|
78 |
|
79 |
#### Training Hyperparameters
|
80 |
|