Updating model files
Browse files
README.md
CHANGED
@@ -8,6 +8,17 @@ datasets:
|
|
8 |
- the_pile_books3
|
9 |
inference: false
|
10 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
|
13 |
# MPT-7B-Storywriter GGML
|
@@ -60,6 +71,17 @@ bin/mpt -m /path/to/mpt-7b-storywriter.ggmlv3.q4_0.bin -t 8 -n 512 -p "Write a s
|
|
60 |
|
61 |
Please see the ggml repo for other build options.
|
62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
63 |
# Original model card: MPT-7B-Storywriter
|
64 |
|
65 |
# MPT-7B-StoryWriter-65k+
|
@@ -119,7 +141,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(
|
|
119 |
model.to(device='cuda:0')
|
120 |
```
|
121 |
|
122 |
-
Although the model was trained with a sequence length of 2048 and finetuned with a sequence length of 65536,
|
123 |
ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
|
124 |
|
125 |
```python
|
@@ -201,8 +223,8 @@ The data was tokenized using the [EleutherAI/gpt-neox-20b](https://huggingface.c
|
|
201 |
|
202 |
### Training Configuration
|
203 |
|
204 |
-
This model was trained on 8 A100-80GBs for about 2 days using the [MosaicML Platform](https://www.mosaicml.com/platform).
|
205 |
-
The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the [LION](https://arxiv.org/abs/2302.06675) optimizer.
|
206 |
|
207 |
## Limitations and Biases
|
208 |
|
|
|
8 |
- the_pile_books3
|
9 |
inference: false
|
10 |
---
|
11 |
+
<div style="width: 100%;">
|
12 |
+
<img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
|
13 |
+
</div>
|
14 |
+
<div style="display: flex; justify-content: space-between; width: 100%;">
|
15 |
+
<div style="display: flex; flex-direction: column; align-items: flex-start;">
|
16 |
+
<p><a href="https://discord.gg/UBgz4VXf">Chat & support: my new Discord server</a></p>
|
17 |
+
</div>
|
18 |
+
<div style="display: flex; flex-direction: column; align-items: flex-end;">
|
19 |
+
<p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? Patreon coming soon!</a></p>
|
20 |
+
</div>
|
21 |
+
</div>
|
22 |
|
23 |
|
24 |
# MPT-7B-Storywriter GGML
|
|
|
71 |
|
72 |
Please see the ggml repo for other build options.
|
73 |
|
74 |
+
## Want to support my work?
|
75 |
+
|
76 |
+
I've had a lot of people ask if they can contribute. I love providing models and helping people, but it is starting to rack up pretty big cloud computing bills.
|
77 |
+
|
78 |
+
So if you're able and willing to contribute, it'd be most gratefully received and will help me to keep providing models, and work on various AI projects.
|
79 |
+
|
80 |
+
Donaters will get priority support on any and all AI/LLM/model questions, and I'll gladly quantise any model you'd like to try.
|
81 |
+
|
82 |
+
* Patreon: coming soon! (just awaiting approval)
|
83 |
+
* Ko-Fi: https://ko-fi.com/TheBlokeAI
|
84 |
+
* Discord: https://discord.gg/UBgz4VXf
|
85 |
# Original model card: MPT-7B-Storywriter
|
86 |
|
87 |
# MPT-7B-StoryWriter-65k+
|
|
|
141 |
model.to(device='cuda:0')
|
142 |
```
|
143 |
|
144 |
+
Although the model was trained with a sequence length of 2048 and finetuned with a sequence length of 65536,
|
145 |
ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
|
146 |
|
147 |
```python
|
|
|
223 |
|
224 |
### Training Configuration
|
225 |
|
226 |
+
This model was trained on 8 A100-80GBs for about 2 days using the [MosaicML Platform](https://www.mosaicml.com/platform).
|
227 |
+
The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the [LION](https://arxiv.org/abs/2302.06675) optimizer.
|
228 |
|
229 |
## Limitations and Biases
|
230 |
|