Composer
MosaicML
llm-foundry
TheBloke commited on
Commit
6cf22be
1 Parent(s): cf4eae7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +310 -0
README.md ADDED
@@ -0,0 +1,310 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ datasets:
4
+ - jeffwan/sharegpt_vicuna
5
+ - Hello-SimpleAI/HC3
6
+ - tatsu-lab/alpaca
7
+ - Anthropic/hh-rlhf
8
+ - victor123/evol_instruct_70k
9
+ tags:
10
+ - Composer
11
+ - MosaicML
12
+ - llm-foundry
13
+ inference: false
14
+ ---
15
+
16
+ <!-- header start -->
17
+ <div style="width: 100%;">
18
+ <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
19
+ </div>
20
+ <div style="display: flex; justify-content: space-between; width: 100%;">
21
+ <div style="display: flex; flex-direction: column; align-items: flex-start;">
22
+ <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
23
+ </div>
24
+ <div style="display: flex; flex-direction: column; align-items: flex-end;">
25
+ <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
26
+ </div>
27
+ </div>
28
+ <!-- header end -->
29
+
30
+ # MosaicML's MPT-7B-Chat GGML
31
+
32
+ These files are GGML format model files for [MosaicML's MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat).
33
+
34
+ Please note that these GGMLs are **not compatible with llama.cpp, or currently with text-generation-webui**. Please see below for a list of tools known to work with these model files.
35
+
36
+ [KoboldCpp](https://github.com/LostRuins/koboldcpp) just added GPU accelerated (OpenCL) support for MPT models, so that is the client I recommend using for these models.
37
+
38
+ **Note**: Please make sure you're using KoboldCpp version 1.32.3 or later, as a number of MPT-related bugs are fixed.
39
+
40
+ ## Repositories available
41
+
42
+ * [4, 5, and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/mpt-7B-chat-GGML)
43
+ * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/mosaicml/mpt-7b-chat)
44
+
45
+ ## Prompt template
46
+
47
+ Based on the code for the MPT 30B Chat Space, I believe this is the correct prompt template:
48
+
49
+ ```
50
+ <|im_start|>system
51
+ A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.<|im_end|>
52
+ <|im_start|>user
53
+ prompt goes here<|im_end|>
54
+ <|im_start|>assistant
55
+ ```
56
+
57
+ ## A note regarding context length: 4K
58
+
59
+ The base model has an 4K context length.
60
+
61
+ [KoboldCpp](https://github.com/LostRuins/koboldcpp) supports 4K context if you manually set it to 4K by adjusting the text box above the slider, like in this example:
62
+ ![.](https://i.imgur.com/tEbpeJqm.png)
63
+
64
+ (Set it to 4K, not 8K for this model.)
65
+
66
+ <!-- compatibility_ggml start -->
67
+ ## Compatibilty
68
+
69
+ These files are **not** compatible with text-generation-webui, llama.cpp, or llama-cpp-python.
70
+
71
+ Currently they can be used with:
72
+ * KoboldCpp, a powerful inference engine based on llama.cpp, with good UI and GPU accelerated support for MPT models: [KoboldCpp](https://github.com/LostRuins/koboldcpp)
73
+ * The ctransformers Python library, which includes LangChain support: [ctransformers](https://github.com/marella/ctransformers)
74
+ * The LoLLMS Web UI which uses ctransformers: [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui)
75
+ * [rustformers' llm](https://github.com/rustformers/llm)
76
+ * The example `mpt` binary provided with [ggml](https://github.com/ggerganov/ggml)
77
+
78
+ As other options become available I will endeavour to update them here (do let me know in the Community tab if I've missed something!)
79
+
80
+ ## Tutorial for using LoLLMS Web UI
81
+
82
+ * [Text tutorial, written by **Lucas3DCG**](https://huggingface.co/TheBloke/MPT-7B-Storywriter-GGML/discussions/2#6475d914e9b57ce0caa68888)
83
+ * [Video tutorial, by LoLLMS Web UI's author **ParisNeo**](https://www.youtube.com/watch?v=ds_U0TDzbzI)
84
+
85
+ <!-- compatibility_ggml end -->
86
+
87
+ ## Provided files
88
+ | Name | Quant method | Bits | Size | Max RAM required | Use case |
89
+ | ---- | ---- | ---- | ---- | ---- | ----- |
90
+ | mpt-7b-chat.ggmlv0.q4_0.bin | q4_0 | 4 | 16.85 GB | 19.35 GB | 4-bit. |
91
+ | mpt-7b-chat.ggmlv0.q4_1.bin | q4_1 | 4 | 18.73 GB | 21.23 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
92
+ | mpt-7b-chat.ggmlv0.q5_0.bin | q5_0 | 5 | 20.60 GB | 23.10 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
93
+ | mpt-7b-chat.ggmlv0.q5_1.bin | q5_1 | 5 | 22.47 GB | 24.97 GB | 5-bit. Even higher accuracy, resource usage and slower inference. |
94
+ | mpt-7b-chat.ggmlv0.q8_0.bin | q8_0 | 8 | 31.83 GB | 34.33 GB | 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
95
+
96
+ **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
97
+
98
+ <!-- footer start -->
99
+ ## Discord
100
+
101
+ For further support, and discussions on these models and AI in general, join us at:
102
+
103
+ [TheBloke AI's Discord server](https://discord.gg/theblokeai)
104
+
105
+ ## Thanks, and how to contribute.
106
+
107
+ Thanks to the [chirper.ai](https://chirper.ai) team!
108
+
109
+ I've had a lot of people ask if they can contribute. I enjoy providing models and helping people, and would love to be able to spend even more time doing it, as well as expanding into new projects like fine tuning/training.
110
+
111
+ If you're able and willing to contribute it will be most gratefully received and will help me to keep providing more models, and to start work on new AI projects.
112
+
113
+ Donaters will get priority support on any and all AI/LLM/model questions and requests, access to a private Discord room, plus other benefits.
114
+
115
+ * Patreon: https://patreon.com/TheBlokeAI
116
+ * Ko-Fi: https://ko-fi.com/TheBlokeAI
117
+
118
+ **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
119
+
120
+ **Patreon special mentions**: zynix , ya boyyy, Trenton Dambrowitz, Imad Khwaja, Alps Aficionado, chris gileta, John Detwiler, Willem Michiel, RoA, Mano Prime, Rainer Wilmers, Fred von Graf, Matthew Berman, Ghost , Nathan LeClaire, Iucharbius , Ai Maven, Illia Dulskyi, Joseph William Delisle, Space Cruiser, Lone Striker, Karl Bernard, Eugene Pentland, Greatston Gnanesh, Jonathan Leane, Randy H, Pierre Kircher, Willian Hasse, Stephen Murray, Alex , terasurfer , Edmond Seymore, Oscar Rangel, Luke Pendergrass, Asp the Wyvern, Junyu Yang, David Flickinger, Luke, Spiking Neurons AB, subjectnull, Pyrater, Nikolai Manek, senxiiz, Ajan Kanaga, Johann-Peter Hartmann, Artur Olbinski, Kevin Schuppel, Derek Yates, Kalila, K, Talal Aujan, Khalefa Al-Ahmad, Gabriel Puliatti, John Villwock, WelcomeToTheClub, Daniel P. Andersen, Preetika Verma, Deep Realms, Fen Risland, trip7s trip, webtim, Sean Connelly, Michael Levine, Chris McCloskey, biorpg, vamX, Viktor Bowallius, Cory Kujawski
121
+
122
+ Thank you to all my generous patrons and donaters!
123
+
124
+ <!-- footer end -->
125
+
126
+ # Original model card: MosaicML's MPT-7B-chat
127
+
128
+ # MPT-7B-Chat
129
+
130
+ MPT-7B-Chat is a chatbot-like model for dialogue generation.
131
+ It was built by finetuning [MPT-7B](https://huggingface.co/mosaicml/mpt-7b) on the [ShareGPT-Vicuna](https://huggingface.co/datasets/jeffwan/sharegpt_vicuna), [HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3),
132
+ [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca), [HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf), and [Evol-Instruct](https://huggingface.co/datasets/victor123/evol_instruct_70k) datasets.
133
+ * License: _CC-By-NC-SA-4.0_ (non-commercial use only)
134
+ * [Demo on Hugging Face Spaces](https://huggingface.co/spaces/mosaicml/mpt-7b-chat)
135
+
136
+
137
+ This model was trained by [MosaicML](https://www.mosaicml.com) and follows a modified decoder-only transformer architecture.
138
+
139
+ ## Model Date
140
+
141
+ May 5, 2023
142
+
143
+ ## Model License
144
+
145
+ _CC-By-NC-SA-4.0_ (non-commercial use only)
146
+
147
+ ## Documentation
148
+
149
+ * [Blog post: Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs](https://www.mosaicml.com/blog/mpt-7b)
150
+ * [Codebase (mosaicml/llm-foundry repo)](https://github.com/mosaicml/llm-foundry/)
151
+ * Questions: Feel free to contact us via the [MosaicML Community Slack](https://mosaicml.me/slack)!
152
+
153
+ ### Example Dialogue
154
+
155
+ **SamIAm85**:
156
+ >I want you to come up with a tweet based on this summary of the article:
157
+ >"Introducing MPT-7B, the latest entry in our MosaicML Foundation Series.
158
+ >MPT-7B is a transformer trained from scratch on IT tokens of text and code.
159
+ >It is open source and it matches the quality of LLaMA-7B.
160
+ >MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k.
161
+ >Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch.
162
+ >For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-Story Writer-65k+, the last of which uses a context length of 65k tokens!
163
+
164
+ **MPT-7B-Instruct**:
165
+ >Here's my suggestion for a tweet:
166
+ >"Exciting news! @MosaicML has launched #MPT_7B - a new standard for open-source LLMs. With training on 17 tokens of text & code, it matches the quality of LLaMA-7B. Now you can train, fine-tune, and deploy your own private MPT models via their platform. Check out the full blog post here: https://bit.ly/3j6UJzZ"
167
+
168
+ **SamIAm85**:
169
+ >Is there anything you want to say to MosaicML about this exciting news?
170
+
171
+ **MPT-7B-Instruct**:
172
+ > As an Al language model myself, I am very impressed with the advancements made by MosaicML in developing such high-quality open-source LLMs.
173
+ > This new version of MPT-7B is truly impressive and I look forward to seeing what innovative applications developers will create using these powerful tools.
174
+ > Thank you for your hard work and dedication to advancing Al research and development.
175
+
176
+ ## How to Use
177
+
178
+ This model is best used with the MosaicML [llm-foundry repository](https://github.com/mosaicml/llm-foundry) for training and finetuning.
179
+
180
+ ```python
181
+ import transformers
182
+ model = transformers.AutoModelForCausalLM.from_pretrained(
183
+ 'mosaicml/mpt-7b-chat',
184
+ trust_remote_code=True
185
+ )
186
+ ```
187
+ Note: This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method.
188
+ This is because we use a custom `MPT` model architecture that is not yet part of the Hugging Face `transformers` package.
189
+ `MPT` includes options for many training efficiency features such as [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf), [ALiBi](https://arxiv.org/abs/2108.12409), [QK LayerNorm](https://arxiv.org/abs/2010.04245), and more.
190
+
191
+ To use the optimized [triton implementation](https://github.com/openai/triton) of FlashAttention, you can load the model on GPU (`cuda:0`) with `attn_impl='triton'` and with `bfloat16` precision:
192
+ ```python
193
+ import torch
194
+ import transformers
195
+
196
+ name = 'mosaicml/mpt-7b-chat'
197
+
198
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
199
+ config.attn_config['attn_impl'] = 'triton'
200
+ config.init_device = 'cuda:0' # For fast initialization directly on GPU!
201
+
202
+ model = transformers.AutoModelForCausalLM.from_pretrained(
203
+ name,
204
+ config=config,
205
+ torch_dtype=torch.bfloat16, # Load model weights in bfloat16
206
+ trust_remote_code=True
207
+ )
208
+ ```
209
+
210
+ Although the model was trained with a sequence length of 2048, ALiBi enables users to increase the maximum sequence length during finetuning and/or inference. For example:
211
+
212
+ ```python
213
+ import transformers
214
+
215
+ name = 'mosaicml/mpt-7b-chat'
216
+
217
+ config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
218
+ config.max_seq_len = 4096 # (input + output) tokens can now be up to 4096
219
+
220
+ model = transformers.AutoModelForCausalLM.from_pretrained(
221
+ name,
222
+ config=config,
223
+ trust_remote_code=True
224
+ )
225
+ ```
226
+
227
+ This model was trained with the [EleutherAI/gpt-neox-20b](https://huggingface.co/EleutherAI/gpt-neox-20b) tokenizer.
228
+
229
+ ```python
230
+ from transformers import AutoTokenizer
231
+ tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
232
+ ```
233
+
234
+ The model can then be used, for example, within a text-generation pipeline.
235
+ Note: when running Torch modules in lower precision, it is best practice to use the [torch.autocast context manager](https://pytorch.org/docs/stable/amp.html).
236
+
237
+ ```python
238
+ from transformers import pipeline
239
+
240
+ pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
241
+
242
+ with torch.autocast('cuda', dtype=torch.bfloat16):
243
+ print(
244
+ pipe('Here is a recipe for vegan banana bread:\n',
245
+ max_new_tokens=100,
246
+ do_sample=True,
247
+ use_cache=True))
248
+ ```
249
+
250
+ ## Model Description
251
+
252
+ The architecture is a modification of a standard decoder-only transformer.
253
+
254
+ The model has been modified from a standard transformer in the following ways:
255
+ * It uses [FlashAttention](https://arxiv.org/pdf/2205.14135.pdf)
256
+ * It uses [ALiBi (Attention with Linear Biases)](https://arxiv.org/abs/2108.12409) and does not use positional embeddings
257
+ * It does not use biases
258
+
259
+
260
+ | Hyperparameter | Value |
261
+ |----------------|-------|
262
+ |n_parameters | 6.7B |
263
+ |n_layers | 32 |
264
+ | n_heads | 32 |
265
+ | d_model | 4096 |
266
+ | vocab size | 50432 |
267
+ | sequence length | 2048 |
268
+
269
+ ### Training Configuration
270
+
271
+ This model was trained on 8 A100-80GBs for about 8.2 hours, followed by training for 6.7 hours on 32 A100-40GBs using the [MosaicML Platform](https://www.mosaicml.com/platform).
272
+ The model was trained with sharded data parallelism using [FSDP](https://pytorch.org/docs/stable/fsdp.html) and used the AdamW optimizer.
273
+
274
+ ## Limitations and Biases
275
+
276
+ _The following language is modified from [EleutherAI's GPT-NeoX-20B](https://huggingface.co/EleutherAI/gpt-neox-20b)_
277
+
278
+ MPT-7B-Chat can produce factually incorrect output, and should not be relied on to produce factually accurate information.
279
+ MPT-7B-Chat was trained on various public datasets.
280
+ While great efforts have been taken to clean the pretraining data, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
281
+
282
+ ## Acknowledgements
283
+
284
+ This model was finetuned by Sam Havens and the MosaicML NLP team
285
+
286
+ ## Disclaimer
287
+
288
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.
289
+
290
+
291
+ ## MosaicML Platform
292
+
293
+ If you're interested in [training](https://www.mosaicml.com/training) and [deploying](https://www.mosaicml.com/inference) your own MPT or LLMs on the MosaicML Platform, [sign up here](https://forms.mosaicml.com/demo?utm_source=huggingface&utm_medium=referral&utm_campaign=mpt-7b).
294
+
295
+
296
+ ## Citation
297
+
298
+ Please cite this model using the following format:
299
+
300
+ ```
301
+ @online{MosaicML2023Introducing,
302
+ author = {MosaicML NLP Team},
303
+ title = {Introducing MPT-7B: A New Standard for Open-Source,
304
+ ly Usable LLMs},
305
+ year = {2023},
306
+ url = {www.mosaicml.com/blog/mpt-7b},
307
+ note = {Accessed: 2023-03-28}, % change this date
308
+ urldate = {2023-03-28} % change this date
309
+ }
310
+ ```