File size: 6,688 Bytes
9faedd5
 
 
81a452c
9faedd5
 
 
 
 
 
 
069bb93
a232f93
9faedd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e73c92a
9faedd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0cd3b85
9faedd5
 
d0ba89d
 
c450381
9faedd5
 
 
 
 
 
a90ee2e
 
 
 
 
 
 
 
 
 
 
 
a232f93
a90ee2e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9faedd5
 
 
 
 
 
 
 
 
 
 
 
b4e9608
cefc71d
 
 
 
9faedd5
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
license: apache-2.0
inference: false
---

# SN-13B-8k-Instruct

<!-- Provide a quick summary of what the model is/does. -->

SN-13B-8k-Instruct is a 13 billion parameter model. It was pretrained as well as instruction tuned on
[SambaNova DataScale systems](https://sambanova.ai/products/datascale/).  This model is meant to be used for tasks requiring long sequence understanding.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** [SambaNova Systems](https://sambanova.ai/)
- **Model type:** Language Model
- **Language(s):** English
- **License:** Apache 2.0

### Basic Information

<!-- Provide the basic links for the model. -->
- **Blog Post**: [Link](<add link>)
- **Discord**: [Link](https://discord.com/invite/8z2Pe7cpRv)
<!-- - **Github**: [Link](https://github.com/sambanova/bloomchat) -->

### Licensing

To increase accessibility and to support the open-source community, SambaNova is releasing SN-13B-8k-Instruct under an Apache 2.0 license. [Please review SambaNova’s SN-13B-8k-Instruct-176B License](LICENSE)

## Uses
<details>
<summary>Click to expand</summary>
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
This model is intended for commercial and research use.


### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->


SN-13B-8k-Instruct should NOT be used for:

- Mission-critical applications
- Applications that involve the safety of others
- Making highly important decisions
- Important automated pipelines

This model is still in early development and can be prone to mistakes and hallucinations, there is still room for improvement. This model is intended to provide the community with a multilingual chat LLM baseline.

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page.

</details>


---
## Running the model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/SN-13B-8k-Instruct")
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/SN-13B-8k-Instruct")

prompt = 'Define Machine Learning.'
inputs = tokenizer(prompt, return_tensors='pt')

# SN-13B-8k-Instruct occasionally repeats itself when do_sample=False.
# Set do_sample=True when using the model to avoid this.
outputs = model.generate(**inputs, use_cache=True, max_new_tokens=50, do_sample=False)

print(tokenizer.batch_decode(outputs))
```

---

## Training Details

<details>
<summary>Click to expand</summary>

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

We trained SN-13B-8k-Instruct with [SambaNova DataScale systems](https://sambanova.ai/products/datascale/) with
SambaNova's in-house Reconfigurable Dataflow Unit (RDU). We started from random weights, and pretrained for 300 Billion
tokens on sequences of size 2048.  We then pretrained for another 500 Billion tokens on sequences of size 8192.
During this phase of training, we curated a dataset that had a large proportion of long sequence articles, with
30% of our articles consisting of greater than 6000 words.

We applied instruction tuning on a variety of tasks derived from datasets such as FLANv2, P3, NLI, etc.

### Hyperparameters

**Pretraining on 8k SS**

- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
- Optimizer: AdamW
- Steps: 60000
- Global Batch size: 1024
- Learning Rate: 1e-5
- Learning Rate Scheduler: Fixed
- Warmup Steps: 0
- Weight decay: 0.1

**Instruction-tuned Training**

- Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
- Optimizer: AdamW
- Steps: 35000
- Global Batch size: 64
- Learning Rate: 1e-5
- Learning Rate Scheduler: Fixed
- Warmup Steps: 0
- Weight decay: 0.1

</details>

---

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

Like all LLMs, SN-13B-8k-Instruct has certain limitations:
- Hallucination: SN-13B-8k-Instruct may sometimes generate responses that contain plausible-sounding but factually incorrect or irrelevant information.
- Repetition: SN-13B-8k-Instruct may produce repetitive phrases or sentences, leading to less engaging and informative responses.
- Coding and Math: The model's performance in generating accurate code or solving complex mathematical problems may be limited.
- Toxicity: SN-13B-8k-Instruct may inadvertently generate responses containing inappropriate or harmful content.

## Acknowledgment

We appreciate [Scrolls](https://www.scrolls-benchmark.com/) and [ZeroScrolls](https://www.zero.scrolls-benchmark.com/) for their contributions in creating effective benchmarks to test the long sequence understanding of Large Language Models.
We appreciate [lm-eval-harness](https://github.com/EleutherAI/lm-evaluation-harness) and [HELM](https://crfm.stanford.edu/helm/latest/) for their essential benchmarking contributions, 
which were both very helpful in evaluating SN-13B-8k-Instruct's performance. We appreciate the inspiration from the wave of various recent open-source long sequence models, 
including [XGen](https://blog.salesforceairesearch.com/xgen/), [MPT](https://www.mosaicml.com/blog/long-context-mpt-7b-8k), and 
[Llama-2](https://ai.meta.com/llama/) and so on. We look forward to witnessing the continued growth and success of open-source long sequence models.

We highly appreciate the hard work and dedication of these researchers and organizations towards the advancement of the open-source community. Their contributions were invaluable in the development of SN-13B-8k-Instruct, and we hope that our model can contribute to further advancements in the field.

## Cite SN-13B-8k-Instruct
```
@software{sn-13b-8k-instruct,
  title = {SN-13B-8k-Instruct: a New Open Multilingual Chat LLM},
  author = {SambaNova Systems},
  url = {https://huggingface.co/sambanovasystems/SN-13B-8k-Instruct-176B-v1}
  month = {8},
  year = {2023},
  version = {1.0},
}
```