sambanovasystems
/

SN-13B-8k-Instruct

Feature Extraction

text-generation-inference

Model card Files Files and versions Community

viren-shah commited on Aug 4, 2023

Commit

a232f93

•

1 Parent(s): a90ee2e

add another link

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ inference: false
 <!-- Provide a quick summary of what the model is/does. -->
 SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on
-Sambanova Datascale systems. This model is meant to be used for tasks requiring long sequence understanding.
 ## Model Details
@@ -100,7 +100,7 @@ print(tokenizer.batch_decode(outputs))
 We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
 SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
 tokens on sequences of size 2048. We then pretrained for another 500 Billion tokens on sequences of size 8192.
-During this phase of training, we curated a dataset that has a large proportion of long sequence articles with
 30% of our articles consisting of greater than 6000 words.
 We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, NLI, etc.

 <!-- Provide a quick summary of what the model is/does. -->
 SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on
+[SambaNova DataScale systems](https://sambanova.ai/products/datascale/). This model is meant to be used for tasks requiring long sequence understanding.
 ## Model Details
 We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
 SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
 tokens on sequences of size 2048. We then pretrained for another 500 Billion tokens on sequences of size 8192.
+During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
 30% of our articles consisting of greater than 6000 words.
 We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, NLI, etc.