viren-shah commited on
Commit
a232f93
1 Parent(s): a90ee2e

add another link

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -10,7 +10,7 @@ inference: false
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
12
  SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on
13
- Sambanova Datascale systems. This model is meant to be used for tasks requiring long sequence understanding.
14
 
15
  ## Model Details
16
 
@@ -100,7 +100,7 @@ print(tokenizer.batch_decode(outputs))
100
  We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
101
  SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
102
  tokens on sequences of size 2048. We then pretrained for another 500 Billion tokens on sequences of size 8192.
103
- During this phase of training, we curated a dataset that has a large proportion of long sequence articles with
104
  30% of our articles consisting of greater than 6000 words.
105
 
106
  We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, NLI, etc.
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
12
  SN-13B-8k-Instruct is a 13 billion parameter model. It is both pretrained from scratch as well as instruction tuned on
13
+ [SambaNova DataScale systems](https://sambanova.ai/products/datascale/). This model is meant to be used for tasks requiring long sequence understanding.
14
 
15
  ## Model Details
16
 
 
100
  We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
101
  SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
102
  tokens on sequences of size 2048. We then pretrained for another 500 Billion tokens on sequences of size 8192.
103
+ During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
104
  30% of our articles consisting of greater than 6000 words.
105
 
106
  We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, NLI, etc.