viren-shah
commited on
Commit
•
a232f93
1
Parent(s):
a90ee2e
add another link
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ inference: false
|
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on
|
13 |
-
|
14 |
|
15 |
## Model Details
|
16 |
|
@@ -100,7 +100,7 @@ print(tokenizer.batch_decode(outputs))
|
|
100 |
We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
|
101 |
SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
|
102 |
tokens on sequences of size 2048. We then pretrained for another 500 Billion tokens on sequences of size 8192.
|
103 |
-
During this phase of training, we curated a dataset that
|
104 |
30% of our articles consisting of greater than 6000 words.
|
105 |
|
106 |
We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, NLI, etc.
|
|
|
10 |
<!-- Provide a quick summary of what the model is/does. -->
|
11 |
|
12 |
SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on
|
13 |
+
[SambaNova DataScale systems](https://sambanova.ai/products/datascale/). This model is meant to be used for tasks requiring long sequence understanding.
|
14 |
|
15 |
## Model Details
|
16 |
|
|
|
100 |
We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
|
101 |
SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
|
102 |
tokens on sequences of size 2048. We then pretrained for another 500 Billion tokens on sequences of size 8192.
|
103 |
+
During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
|
104 |
30% of our articles consisting of greater than 6000 words.
|
105 |
|
106 |
We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, NLI, etc.
|