NeoChen1024 commited on
Commit
e37b364
·
verified ·
1 Parent(s): bdaf894

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,397 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - int8
4
+ - w8a8
5
+ language:
6
+ - en
7
+ - fr
8
+ - de
9
+ - es
10
+ - it
11
+ - pt
12
+ - zh
13
+ - ja
14
+ - ru
15
+ - ko
16
+ license: other
17
+ license_name: mrl
18
+ inference: false
19
+ license_link: https://mistral.ai/licenses/MRL-0.1.md
20
+ library_name: vllm
21
+ base_model:
22
+ - mistralai/Ministral-8B-Instruct-2410
23
+ ---
24
+
25
+ # W8A8 Quant of Ministral-8B-Instruct-2410
26
+ Quantization script: <https://github.com/NeoChen1024/scripts/blob/master/llm-compressor-quantize.py>
27
+
28
+ We introduce two new state-of-the-art models for local intelligence, on-device computing, and at-the-edge use cases. We call them les Ministraux: Ministral 3B and Ministral 8B.
29
+
30
+ The Ministral-8B-Instruct-2410 Language Model is an instruct fine-tuned model significantly outperforming existing models of similar size, released under the Mistral Research License.
31
+
32
+ If you are interested in using Ministral-3B or Ministral-8B commercially, outperforming Mistral-7B, [reach out to us](https://mistral.ai/contact/).
33
+
34
+ For more details about les Ministraux please refer to our release [blog post](https://mistral.ai/news/ministraux).
35
+
36
+ ## Ministral 8B Key features
37
+ - Released under the **Mistral Research License**, reach out to us for a commercial license
38
+ - Trained with a **128k context window** with **interleaved sliding-window attention**
39
+ - Trained on a large proportion of **multilingual and code data**
40
+ - Supports **function calling**
41
+ - Vocabulary size of **131k**, using the **V3-Tekken** tokenizer
42
+
43
+ ### Basic Instruct Template (V3-Tekken)
44
+
45
+ ```
46
+ <s>[INST]user message[/INST]assistant response</s>[INST]new user message[/INST]
47
+ ```
48
+
49
+ *For more information about the tokenizer please refer to [mistral-common](https://github.com/mistralai/mistral-common)*
50
+
51
+ ## Ministral 8B Architecture
52
+
53
+ | Feature | Value |
54
+ |:---------------------:|:--------------------:|
55
+ | **Architecture** | Dense Transformer |
56
+ | **Parameters** | 8,019,808,256 |
57
+ | **Layers** | 36 |
58
+ | **Heads** | 32 |
59
+ | **Dim** | 4096 |
60
+ | **KV Heads (GQA)** | 8 |
61
+ | **Hidden Dim** | 12288 |
62
+ | **Head Dim** | 128 |
63
+ | **Vocab Size** | 131,072 |
64
+ | **Context Length** | 128k |
65
+ | **Attention Pattern** | Ragged (128k,32k,32k,32k) |
66
+
67
+ ## Benchmarks
68
+
69
+ #### Base Models
70
+
71
+ <u>Knowledge & Commonsense</u>
72
+
73
+ | Model | MMLU | AGIEval | Winogrande | Arc-c | TriviaQA |
74
+ |:-------------:|:------:|:---------:|:------------:|:-------:|:----------:|
75
+ | Mistral 7B Base | 62.5 | 42.5 | 74.2 | 67.9 | 62.5 |
76
+ | Llama 3.1 8B Base | 64.7 | 44.4 | 74.6 | 46.0 | 60.2 |
77
+ | ***Ministral 8B Base*** | ***<u>65.0</u>*** | ***<u>48.3</u>*** | ***<u>75.3</u>*** | ***<u>71.9</u>*** | ***<u>65.5</u>*** |
78
+ | | | | | | |
79
+ | Gemma 2 2B Base | 52.4 | 33.8 | 68.7 | 42.6 | 47.8 |
80
+ | Llama 3.2 3B Base | 56.2 | 37.4 | 59.6 | 43.1 | 50.7 |
81
+ | ***Ministral 3B Base*** | ***<u>60.9</u>*** | ***<u>42.1</u>*** | ***<u>72.7</u>*** | ***<u>64.2</u>*** | ***<u>56.7</u>*** |
82
+
83
+ <u>Code & Math</u>
84
+
85
+ | Model | HumanEval pass@1 |GSM8K maj@8 |
86
+ |:-------------:|:-------------------:|:---------------:|
87
+ | Mistral 7B Base | 26.8 | 32.0 |
88
+ | Llama 3.1 8B Base | ***<u>37.8</u>*** | 42.2 |
89
+ | ***Ministral 8B Base*** | 34.8 | ***<u>64.5</u>*** |
90
+ | | | |
91
+ | Gemma 2 2B | 20.1 | 35.5 |
92
+ | Llama 3.2 3B | 14.6 | 33.5 |
93
+ | ***Ministral 3B*** | ***<u>34.2</u>*** | ***<u>50.9</u>*** |
94
+
95
+ <u>Multilingual</u>
96
+
97
+ | Model | French MMLU | German MMLU | Spanish MMLU |
98
+ |:-------------:|:-------------:|:-------------:|:-------------:|
99
+ | Mistral 7B Base | 50.6 | 49.6 | 51.4 |
100
+ | Llama 3.1 8B Base | 50.8 | 52.8 | 54.6 |
101
+ | ***Ministral 8B Base*** | ***<u>57.5</u>*** | ***<u>57.4</u>*** | ***<u>59.6</u>*** |
102
+ | | | | |
103
+ | Gemma 2 2B Base | 41.0 | 40.1 | 41.7 |
104
+ | Llama 3.2 3B Base | 42.3 | 42.2 | 43.1 |
105
+ | ***Ministral 3B Base*** | ***<u>49.1</u>*** | ***<u>48.3</u>*** | ***<u>49.5</u>*** |
106
+
107
+ ### Instruct Models
108
+
109
+ <u>Chat/Arena (gpt-4o judge)</u>
110
+
111
+ | Model | MTBench | Arena Hard | Wild bench |
112
+ |:-------------:|:---------:|:------------:|:------------:|
113
+ | Mistral 7B Instruct v0.3 | 6.7 | 44.3 | 33.1 |
114
+ | Llama 3.1 8B Instruct | 7.5 | 62.4 | 37.0 |
115
+ | Gemma 2 9B Instruct | 7.6 | 68.7 | ***<u>43.8</u>*** |
116
+ | ***Ministral 8B Instruct*** | ***<u>8.3</u>*** | ***<u>70.9</u>*** | 41.3 |
117
+ | | | | |
118
+ | Gemma 2 2B Instruct | 7.5 | 51.7 | 32.5 |
119
+ | Llama 3.2 3B Instruct | 7.2 | 46.0 | 27.2 |
120
+ | ***Ministral 3B Instruct*** | ***<u>8.1</u>*** | ***<u>64.3</u>*** | ***<u>36.3</u>*** |
121
+
122
+ <u>Code & Math</u>
123
+
124
+ | Model | MBPP pass@1 | HumanEval pass@1 | Math maj@1 |
125
+ |:-------------:|:-------------:|:------------------:|:-------------:|
126
+ | Mistral 7B Instruct v0.3 | 50.2 | 38.4 | 13.2 |
127
+ | Gemma 2 9B Instruct | 68.5 | 67.7 | 47.4 |
128
+ Llama 3.1 8B Instruct | 69.7 | 67.1 | 49.3 |
129
+ | ***Ministral 8B Instruct*** | ***<u>70.0</u>*** | ***<u>76.8</u>*** | ***<u>54.5</u>*** |
130
+ | | | | |
131
+ | Gemma 2 2B Instruct | 54.5 | 42.7 | 22.8 |
132
+ | Llama 3.2 3B Instruct | 64.6 | 61.0 | 38.4 |
133
+ | ***Ministral 3B* Instruct** | ***<u>67.7</u>*** | ***<u>77.4</u>*** | ***<u>51.7</u>*** |
134
+
135
+ <u>Function calling</u>
136
+
137
+ | Model | Internal bench |
138
+ |:-------------:|:-----------------:|
139
+ | Mistral 7B Instruct v0.3 | 6.9 |
140
+ | Llama 3.1 8B Instruct | N/A |
141
+ | Gemma 2 9B Instruct | N/A |
142
+ | ***Ministral 8B Instruct*** | ***<u>31.6</u>*** |
143
+ | | |
144
+ | Gemma 2 2B Instruct | N/A |
145
+ | Llama 3.2 3B Instruct | N/A |
146
+ | ***Ministral 3B Instruct*** | ***<u>28.4</u>*** |
147
+
148
+ ## Usage Examples
149
+
150
+ ### vLLM (recommended)
151
+
152
+ We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
153
+ to implement production-ready inference pipelines.
154
+
155
+ > [!IMPORTANT]
156
+ > Currently vLLM is capped at 32k context size because interleaved attention kernels for paged attention are not yet implemented in vLLM.
157
+ > Attention kernels for paged attention are being worked on and as soon as it is fully supported in vLLM, this model card will be updated.
158
+ > To take advantage of the full 128k context size we recommend [Mistral Inference](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410#mistral-inference)
159
+
160
+ **_Installation_**
161
+
162
+
163
+ Make sure you install `vLLM >= v0.6.4`:
164
+
165
+ ```
166
+ pip install --upgrade vllm
167
+ ```
168
+
169
+ Also make sure you have `mistral_common >= 1.4.4` installed:
170
+
171
+ ```
172
+ pip install --upgrade mistral_common
173
+ ```
174
+
175
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile).
176
+
177
+ **_Offline_**
178
+
179
+ ```py
180
+ from vllm import LLM
181
+ from vllm.sampling_params import SamplingParams
182
+
183
+ model_name = "mistralai/Ministral-8B-Instruct-2410"
184
+
185
+ sampling_params = SamplingParams(max_tokens=8192)
186
+
187
+ # note that running Ministral 8B on a single GPU requires 24 GB of GPU RAM
188
+ # If you want to divide the GPU requirement over multiple devices, please add *e.g.* `tensor_parallel=2`
189
+ llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
190
+
191
+ prompt = "Do we need to think for 10 seconds to find the answer of 1 + 1?"
192
+
193
+ messages = [
194
+ {
195
+ "role": "user",
196
+ "content": prompt
197
+ },
198
+ ]
199
+
200
+ outputs = llm.chat(messages, sampling_params=sampling_params)
201
+
202
+ print(outputs[0].outputs[0].text)
203
+ # You don't need to think for 10 seconds to find the answer to 1 + 1. The answer is 2,
204
+ # and you can easily add these two numbers in your mind very quickly without any delay.
205
+ ```
206
+
207
+ **_Server_**
208
+
209
+ You can also use Ministral-8B in a server/client setting.
210
+
211
+ 1. Spin up a server:
212
+
213
+
214
+ ```
215
+ vllm serve mistralai/Ministral-8B-Instruct-2410 --tokenizer_mode mistral --config_format mistral --load_format mistral
216
+ ```
217
+
218
+ **Note:** Running Ministral-8B on a single GPU requires 24 GB of GPU RAM.
219
+
220
+ If you want to divide the GPU requirement over multiple devices, please add *e.g.* `--tensor_parallel=2`
221
+
222
+ 2. And ping the client:
223
+
224
+ ```
225
+ curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
226
+ --header 'Content-Type: application/json' \
227
+ --header 'Authorization: Bearer token' \
228
+ --data '{
229
+ "model": "mistralai/Ministral-8B-Instruct-2410",
230
+ "messages": [
231
+ {
232
+ "role": "user",
233
+ "content": "Do we need to think for 10 seconds to find the answer of 1 + 1?"
234
+ }
235
+ ]
236
+ }'
237
+
238
+ ```
239
+
240
+ ### Mistral-inference
241
+
242
+ We recommend using [mistral-inference](https://github.com/mistralai/mistral-inference) to quickly try out / "vibe-check" the model.
243
+
244
+
245
+ **_Install_**
246
+
247
+ Make sure to have `mistral_inference >= 1.5.0` installed.
248
+
249
+ ```
250
+ pip install mistral_inference --upgrade
251
+ ```
252
+
253
+ **_Download_**
254
+
255
+ ```py
256
+ from huggingface_hub import snapshot_download
257
+ from pathlib import Path
258
+
259
+ mistral_models_path = Path.home().joinpath('mistral_models', '8B-Instruct')
260
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
261
+
262
+ snapshot_download(repo_id="mistralai/Ministral-8B-Instruct-2410", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
263
+ ```
264
+
265
+ ### Chat
266
+
267
+ After installing `mistral_inference`, a `mistral-chat` CLI command should be available in your environment. You can chat with the model using
268
+
269
+ ```
270
+ mistral-chat $HOME/mistral_models/8B-Instruct --instruct --max_tokens 256
271
+ ```
272
+
273
+ ### Passkey detection
274
+
275
+ > [!IMPORTANT]
276
+ > In this example the passkey message has over >100k tokens and mistral-inference
277
+ > does not have a chunked pre-fill mechanism. Therefore you will need a lot of
278
+ > GPU memory in order to run the below example (80 GB). For a more memory-efficient
279
+ > solution we recommend using vLLM.
280
+
281
+ ```py
282
+ from mistral_inference.transformer import Transformer
283
+ from pathlib import Path
284
+ import json
285
+ from mistral_inference.generate import generate
286
+ from huggingface_hub import hf_hub_download
287
+
288
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
289
+ from mistral_common.protocol.instruct.messages import UserMessage
290
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
291
+
292
+ def load_passkey_request() -> ChatCompletionRequest:
293
+ passkey_file = hf_hub_download(repo_id="mistralai/Ministral-8B-Instruct-2410", filename="passkey_example.json")
294
+
295
+ with open(passkey_file, "r") as f:
296
+ data = json.load(f)
297
+
298
+ message_content = data["messages"][0]["content"]
299
+ return ChatCompletionRequest(messages=[UserMessage(content=message_content)])
300
+
301
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
302
+ model = Transformer.from_folder(mistral_models_path, softmax_fp32=False)
303
+
304
+ completion_request = load_passkey_request()
305
+
306
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
307
+
308
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
309
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
310
+
311
+ print(result) # The pass key is 13005.
312
+ ```
313
+
314
+
315
+ ### Instruct following
316
+
317
+ ```py
318
+ from mistral_inference.transformer import Transformer
319
+ from mistral_inference.generate import generate
320
+
321
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
322
+ from mistral_common.protocol.instruct.messages import UserMessage
323
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
324
+
325
+
326
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
327
+ model = Transformer.from_folder(mistral_models_path)
328
+
329
+ completion_request = ChatCompletionRequest(messages=[UserMessage(content="How often does the letter r occur in Mistral?")])
330
+
331
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
332
+
333
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
334
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
335
+
336
+ print(result)
337
+ ```
338
+
339
+ ### Function calling
340
+
341
+ ```py
342
+ from mistral_common.protocol.instruct.tool_calls import Function, Tool
343
+ from mistral_inference.transformer import Transformer
344
+ from mistral_inference.generate import generate
345
+
346
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
347
+ from mistral_common.protocol.instruct.messages import UserMessage
348
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
349
+ from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
350
+
351
+
352
+ tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json")
353
+ tekken = tokenizer.instruct_tokenizer.tokenizer
354
+ tekken.special_token_policy = SpecialTokenPolicy.IGNORE
355
+
356
+ model = Transformer.from_folder(mistral_models_path)
357
+
358
+ completion_request = ChatCompletionRequest(
359
+ tools=[
360
+ Tool(
361
+ function=Function(
362
+ name="get_current_weather",
363
+ description="Get the current weather",
364
+ parameters={
365
+ "type": "object",
366
+ "properties": {
367
+ "location": {
368
+ "type": "string",
369
+ "description": "The city and state, e.g. San Francisco, CA",
370
+ },
371
+ "format": {
372
+ "type": "string",
373
+ "enum": ["celsius", "fahrenheit"],
374
+ "description": "The temperature unit to use. Infer this from the users location.",
375
+ },
376
+ },
377
+ "required": ["location", "format"],
378
+ },
379
+ )
380
+ )
381
+ ],
382
+ messages=[
383
+ UserMessage(content="What's the weather like today in Paris?"),
384
+ ],
385
+ )
386
+
387
+ tokens = tokenizer.encode_chat_completion(completion_request).tokens
388
+
389
+ out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
390
+ result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
391
+
392
+ print(result)
393
+ ```
394
+
395
+ ## The Mistral AI Team
396
+
397
+ Albert Jiang, Alexandre Abou Chahine, Alexandre Sablayrolles, Alexis Tacnet, Alodie Boissonnet, Alok Kothari, Amélie Héliou, Andy Lo, Anna Peronnin, Antoine Meunier, Antoine Roux, Antonin Faure, Aritra Paul, Arthur Darcet, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Avinash Sooriyarachchi, Baptiste Rozière, Barry Conklin, Bastien Bouillon, Blanche Savary de Beauregard, Carole Rambaud, Caroline Feldman, Charles de Freminville, Charline Mauro, Chih-Kuan Yeh, Chris Bamford, Clement Auguy, Corentin Heintz, Cyriaque Dubois, Devendra Singh Chaplot, Diego Las Casas, Diogo Costa, Eléonore Arcelin, Emma Bou Hanna, Etienne Metzger, Fanny Olivier Autran, Francois Lesage, Garance Gourdel, Gaspard Blanchet, Gaspard Donada Vidal, Gianna Maria Lengyel, Guillaume Bour, Guillaume Lample, Gustave Denis, Harizo Rajaona, Himanshu Jaju, Ian Mack, Ian Mathew, Jean-Malo Delignon, Jeremy Facchetti, Jessica Chudnovsky, Joachim Studnia, Justus Murke, Kartik Khandelwal, Kenneth Chiu, Kevin Riera, Leonard Blier, Leonard Suslian, Leonardo Deschaseaux, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Sophia Yang, Margaret Jennings, Marie Pellat, Marie Torelli, Marjorie Janiewicz, Mathis Felardos, Maxime Darrin, Michael Hoff, Mickaël Seznec, Misha Jessel Kenyon, Nayef Derwiche, Nicolas Carmont Zaragoza, Nicolas Faurie, Nicolas Moreau, Nicolas Schuhl, Nikhil Raghuraman, Niklas Muhs, Olivier de Garrigues, Patricia Rozé, Patricia Wang, Patrick von Platen, Paul Jacob, Pauline Buche, Pavankumar Reddy Muddireddy, Perry Savas, Pierre Stock, Pravesh Agrawal, Renaud de Peretti, Romain Sauvestre, Romain Sinthe, Roman Soletskyi, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Soham Ghosh, Sylvain Regnier, Szymon Antoniak, Teven Le Scao, Theophile Gervet, Thibault Schueller, Thibaut Lavril, Thomas Wang, Timothée Lacroix, Valeriia Nemychnikova, Wendy Shang, William El Sayed, William Marshall
config.json ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Ministral-8B-Instruct-2410",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 4096,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 12288,
14
+ "max_position_embeddings": 32768,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 32,
17
+ "num_hidden_layers": 36,
18
+ "num_key_value_heads": 8,
19
+ "quantization_config": {
20
+ "config_groups": {
21
+ "group_0": {
22
+ "input_activations": {
23
+ "actorder": null,
24
+ "block_structure": null,
25
+ "dynamic": true,
26
+ "group_size": null,
27
+ "num_bits": 8,
28
+ "observer": null,
29
+ "observer_kwargs": {},
30
+ "strategy": "token",
31
+ "symmetric": true,
32
+ "type": "int"
33
+ },
34
+ "output_activations": null,
35
+ "targets": [
36
+ "Linear"
37
+ ],
38
+ "weights": {
39
+ "actorder": null,
40
+ "block_structure": null,
41
+ "dynamic": false,
42
+ "group_size": null,
43
+ "num_bits": 8,
44
+ "observer": "minmax",
45
+ "observer_kwargs": {},
46
+ "strategy": "channel",
47
+ "symmetric": true,
48
+ "type": "int"
49
+ }
50
+ }
51
+ },
52
+ "format": "int-quantized",
53
+ "global_compression_ratio": 1.5293410717338993,
54
+ "ignore": [
55
+ "lm_head"
56
+ ],
57
+ "kv_cache_scheme": null,
58
+ "quant_method": "compressed-tensors",
59
+ "quantization_status": "compressed"
60
+ },
61
+ "rms_norm_eps": 1e-05,
62
+ "rope_theta": 100000000.0,
63
+ "sliding_window": 32768,
64
+ "tie_word_embeddings": false,
65
+ "torch_dtype": "bfloat16",
66
+ "transformers_version": "4.48.0",
67
+ "use_cache": true,
68
+ "vocab_size": 131072
69
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.48.0"
6
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ccd90a7b79245fd28c026cc56eefcce1ac92b118f0b01c4b846f5b7e6036beb
3
+ size 4976387616
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04ea4d084c2a3bbac4b22b214aec6b5cafc501995273c0903bc0efe34573a6a6
3
+ size 4120330840
model.safetensors.index.json ADDED
@@ -0,0 +1,586 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 9096650752
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00002-of-00002.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
18
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.0.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
20
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.0.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
22
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.0.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
29
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.1.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
31
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.1.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
34
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.1.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
36
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.1.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
38
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.1.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
40
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.10.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
43
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.10.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
45
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.10.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
47
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.10.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
50
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.10.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
52
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.10.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
54
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.10.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
56
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.11.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
59
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.11.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
61
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.11.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
63
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.11.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
66
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.11.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
68
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.11.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
70
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.11.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
72
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.12.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
75
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.12.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
77
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.12.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
79
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.12.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
82
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.12.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
84
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.12.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
86
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.12.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
88
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.13.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
91
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.13.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
93
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.13.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
95
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.13.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
98
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.13.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
100
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
101
+ "model.layers.13.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
102
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
103
+ "model.layers.13.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
104
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.14.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
107
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
108
+ "model.layers.14.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
109
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
110
+ "model.layers.14.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
111
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
113
+ "model.layers.14.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
114
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
115
+ "model.layers.14.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
116
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.14.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
118
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.14.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
120
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
122
+ "model.layers.15.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
123
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.15.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
125
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
126
+ "model.layers.15.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
127
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
129
+ "model.layers.15.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
130
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
131
+ "model.layers.15.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
132
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
133
+ "model.layers.15.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
134
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
135
+ "model.layers.15.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
136
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
137
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
138
+ "model.layers.16.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
139
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
140
+ "model.layers.16.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
141
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
142
+ "model.layers.16.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
143
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.16.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
146
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.16.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
148
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
149
+ "model.layers.16.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
150
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
151
+ "model.layers.16.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
152
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
154
+ "model.layers.17.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
155
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
156
+ "model.layers.17.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
157
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
158
+ "model.layers.17.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
159
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
161
+ "model.layers.17.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
162
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
163
+ "model.layers.17.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
164
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
165
+ "model.layers.17.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
166
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.17.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
168
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
170
+ "model.layers.18.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
171
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
172
+ "model.layers.18.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
173
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
174
+ "model.layers.18.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
175
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
176
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
177
+ "model.layers.18.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
178
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
179
+ "model.layers.18.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
180
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
181
+ "model.layers.18.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
182
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
183
+ "model.layers.18.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
184
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
186
+ "model.layers.19.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
187
+ "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
188
+ "model.layers.19.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
189
+ "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
190
+ "model.layers.19.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
191
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
192
+ "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
193
+ "model.layers.19.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
194
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
195
+ "model.layers.19.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
196
+ "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
197
+ "model.layers.19.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
198
+ "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
199
+ "model.layers.19.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
200
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
201
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
202
+ "model.layers.2.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
203
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
204
+ "model.layers.2.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
205
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
206
+ "model.layers.2.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
207
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
208
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
209
+ "model.layers.2.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
210
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
211
+ "model.layers.2.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
212
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
213
+ "model.layers.2.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
214
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
215
+ "model.layers.2.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
216
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00002.safetensors",
217
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
218
+ "model.layers.20.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
219
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
220
+ "model.layers.20.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
221
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
222
+ "model.layers.20.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
223
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
224
+ "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
225
+ "model.layers.20.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
226
+ "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
227
+ "model.layers.20.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
228
+ "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
229
+ "model.layers.20.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
230
+ "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
231
+ "model.layers.20.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
232
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
233
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
234
+ "model.layers.21.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
235
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
236
+ "model.layers.21.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
237
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
238
+ "model.layers.21.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
239
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
240
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
241
+ "model.layers.21.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
242
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
243
+ "model.layers.21.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
244
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
245
+ "model.layers.21.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
246
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
247
+ "model.layers.21.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
248
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
249
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
250
+ "model.layers.22.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
251
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
252
+ "model.layers.22.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
253
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
254
+ "model.layers.22.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
255
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
256
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
257
+ "model.layers.22.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
258
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
259
+ "model.layers.22.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
260
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
261
+ "model.layers.22.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
262
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
263
+ "model.layers.22.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
264
+ "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
265
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
266
+ "model.layers.23.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
267
+ "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
268
+ "model.layers.23.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
269
+ "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
270
+ "model.layers.23.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
271
+ "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
272
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
273
+ "model.layers.23.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
274
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
275
+ "model.layers.23.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
276
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
277
+ "model.layers.23.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
278
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
279
+ "model.layers.23.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
280
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
281
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
282
+ "model.layers.24.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
283
+ "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
284
+ "model.layers.24.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
285
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
286
+ "model.layers.24.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
287
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
288
+ "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
289
+ "model.layers.24.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
290
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
291
+ "model.layers.24.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
292
+ "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
293
+ "model.layers.24.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
294
+ "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
295
+ "model.layers.24.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
296
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
297
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
298
+ "model.layers.25.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
299
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
300
+ "model.layers.25.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
301
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
302
+ "model.layers.25.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
303
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
304
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
305
+ "model.layers.25.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
306
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
307
+ "model.layers.25.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
308
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
309
+ "model.layers.25.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
310
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
311
+ "model.layers.25.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
312
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
313
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
314
+ "model.layers.26.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
315
+ "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
316
+ "model.layers.26.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
317
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
318
+ "model.layers.26.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
319
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
320
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
321
+ "model.layers.26.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
322
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
323
+ "model.layers.26.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
324
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
325
+ "model.layers.26.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
326
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
327
+ "model.layers.26.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
328
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
329
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
330
+ "model.layers.27.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
331
+ "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
332
+ "model.layers.27.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
333
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
334
+ "model.layers.27.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
335
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
336
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
337
+ "model.layers.27.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
338
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
339
+ "model.layers.27.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
340
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
341
+ "model.layers.27.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
342
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
343
+ "model.layers.27.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
344
+ "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
345
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
346
+ "model.layers.28.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
347
+ "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
348
+ "model.layers.28.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
349
+ "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
350
+ "model.layers.28.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
351
+ "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
352
+ "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
353
+ "model.layers.28.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
354
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
355
+ "model.layers.28.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
356
+ "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
357
+ "model.layers.28.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
358
+ "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
359
+ "model.layers.28.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
360
+ "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
361
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
362
+ "model.layers.29.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
363
+ "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
364
+ "model.layers.29.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
365
+ "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
366
+ "model.layers.29.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
367
+ "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
368
+ "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
369
+ "model.layers.29.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
370
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
371
+ "model.layers.29.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
372
+ "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
373
+ "model.layers.29.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
374
+ "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
375
+ "model.layers.29.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
376
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
377
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
378
+ "model.layers.3.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
379
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
380
+ "model.layers.3.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
381
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
382
+ "model.layers.3.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
383
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
384
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
385
+ "model.layers.3.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
386
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
387
+ "model.layers.3.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
388
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
389
+ "model.layers.3.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
390
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
391
+ "model.layers.3.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
392
+ "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
393
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
394
+ "model.layers.30.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
395
+ "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
396
+ "model.layers.30.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
397
+ "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
398
+ "model.layers.30.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
399
+ "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
400
+ "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
401
+ "model.layers.30.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
402
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
403
+ "model.layers.30.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
404
+ "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
405
+ "model.layers.30.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
406
+ "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
407
+ "model.layers.30.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
408
+ "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
409
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
410
+ "model.layers.31.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
411
+ "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
412
+ "model.layers.31.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
413
+ "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
414
+ "model.layers.31.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
415
+ "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
416
+ "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
417
+ "model.layers.31.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
418
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
419
+ "model.layers.31.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
420
+ "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
421
+ "model.layers.31.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
422
+ "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
423
+ "model.layers.31.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
424
+ "model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
425
+ "model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
426
+ "model.layers.32.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
427
+ "model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
428
+ "model.layers.32.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
429
+ "model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
430
+ "model.layers.32.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
431
+ "model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
432
+ "model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
433
+ "model.layers.32.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
434
+ "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
435
+ "model.layers.32.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
436
+ "model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
437
+ "model.layers.32.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
438
+ "model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
439
+ "model.layers.32.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
440
+ "model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
441
+ "model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
442
+ "model.layers.33.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
443
+ "model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
444
+ "model.layers.33.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
445
+ "model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
446
+ "model.layers.33.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
447
+ "model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
448
+ "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
449
+ "model.layers.33.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
450
+ "model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
451
+ "model.layers.33.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
452
+ "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
453
+ "model.layers.33.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
454
+ "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
455
+ "model.layers.33.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
456
+ "model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
457
+ "model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
458
+ "model.layers.34.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
459
+ "model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
460
+ "model.layers.34.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
461
+ "model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
462
+ "model.layers.34.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
463
+ "model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
464
+ "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
465
+ "model.layers.34.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
466
+ "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
467
+ "model.layers.34.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
468
+ "model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
469
+ "model.layers.34.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
470
+ "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
471
+ "model.layers.34.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
472
+ "model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
473
+ "model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
474
+ "model.layers.35.mlp.down_proj.weight_scale": "model-00002-of-00002.safetensors",
475
+ "model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
476
+ "model.layers.35.mlp.gate_proj.weight_scale": "model-00002-of-00002.safetensors",
477
+ "model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
478
+ "model.layers.35.mlp.up_proj.weight_scale": "model-00002-of-00002.safetensors",
479
+ "model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
480
+ "model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
481
+ "model.layers.35.self_attn.k_proj.weight_scale": "model-00002-of-00002.safetensors",
482
+ "model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
483
+ "model.layers.35.self_attn.o_proj.weight_scale": "model-00002-of-00002.safetensors",
484
+ "model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
485
+ "model.layers.35.self_attn.q_proj.weight_scale": "model-00002-of-00002.safetensors",
486
+ "model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
487
+ "model.layers.35.self_attn.v_proj.weight_scale": "model-00002-of-00002.safetensors",
488
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
489
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
490
+ "model.layers.4.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
491
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
492
+ "model.layers.4.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
493
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
494
+ "model.layers.4.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
495
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
496
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
497
+ "model.layers.4.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
498
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
499
+ "model.layers.4.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
500
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
501
+ "model.layers.4.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
502
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
503
+ "model.layers.4.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
504
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
505
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
506
+ "model.layers.5.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
507
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
508
+ "model.layers.5.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
509
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
510
+ "model.layers.5.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
511
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
512
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
513
+ "model.layers.5.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
514
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
515
+ "model.layers.5.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
516
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
517
+ "model.layers.5.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
518
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
519
+ "model.layers.5.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
520
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
521
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
522
+ "model.layers.6.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
523
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
524
+ "model.layers.6.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
525
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
526
+ "model.layers.6.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
527
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
528
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
529
+ "model.layers.6.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
530
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
531
+ "model.layers.6.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
532
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
533
+ "model.layers.6.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
534
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
535
+ "model.layers.6.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
536
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
537
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
538
+ "model.layers.7.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
539
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
540
+ "model.layers.7.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
541
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
542
+ "model.layers.7.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
543
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
544
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
545
+ "model.layers.7.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
546
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
547
+ "model.layers.7.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
548
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
549
+ "model.layers.7.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
550
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
551
+ "model.layers.7.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
552
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
553
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
554
+ "model.layers.8.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
555
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
556
+ "model.layers.8.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
557
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
558
+ "model.layers.8.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
559
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
560
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
561
+ "model.layers.8.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
562
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
563
+ "model.layers.8.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
564
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
565
+ "model.layers.8.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
566
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
567
+ "model.layers.8.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
568
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
569
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
570
+ "model.layers.9.mlp.down_proj.weight_scale": "model-00001-of-00002.safetensors",
571
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
572
+ "model.layers.9.mlp.gate_proj.weight_scale": "model-00001-of-00002.safetensors",
573
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
574
+ "model.layers.9.mlp.up_proj.weight_scale": "model-00001-of-00002.safetensors",
575
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
576
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
577
+ "model.layers.9.self_attn.k_proj.weight_scale": "model-00001-of-00002.safetensors",
578
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
579
+ "model.layers.9.self_attn.o_proj.weight_scale": "model-00001-of-00002.safetensors",
580
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
581
+ "model.layers.9.self_attn.q_proj.weight_scale": "model-00001-of-00002.safetensors",
582
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
583
+ "model.layers.9.self_attn.v_proj.weight_scale": "model-00001-of-00002.safetensors",
584
+ "model.norm.weight": "model-00002-of-00002.safetensors"
585
+ }
586
+ }
recipe.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ DEFAULT_stage:
2
+ DEFAULT_modifiers:
3
+ SmoothQuantModifier: {smoothing_strength: 0.8}
4
+ GPTQModifier:
5
+ targets: Linear
6
+ dampening_frac: 0.1
7
+ ignore: [lm_head]
8
+ scheme: W8A8
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8dbd3b9bcbc1449032328f3d37b4edfbb95424b882d4005b4dd044ef5c93d5e
3
+ size 17078401
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff