puneeshkhanna commited on
Commit
62e8b6f
1 Parent(s): 24a2788

Add Falcon3-1B-Base with random weights

Browse files
README.md ADDED
@@ -0,0 +1,252 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - es
5
+ - pt
6
+ tags:
7
+ - falcon3
8
+ ---
9
+
10
+
11
+ # Table of Contents
12
+
13
+ 0. [TL;DR](#TL;DR)
14
+ 1. [Model Details](#model-details)
15
+ 2. [Usage](#usage)
16
+ 3. [Training Details](#training-details)
17
+ 4. [Evaluation](#evaluation)
18
+
19
+
20
+ # TL;DR
21
+
22
+ # Model Details
23
+
24
+ ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
25
+
26
+ ## Model Description
27
+
28
+ - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
29
+ - **Model type:** Causal decoder-only
30
+ - **Architecture:** Transformer-base
31
+ - **Language(s) (NLP):** Mainly English
32
+ - **License:** TII Falcon-LLM License 2.0
33
+
34
+ <br>
35
+
36
+ # Usage
37
+
38
+ Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
39
+
40
+ ## Using the Pytorch model with 🤗 transformers
41
+
42
+ ### Running the model on a CPU
43
+
44
+ <details>
45
+ <summary> Click to expand </summary>
46
+
47
+ ```python
48
+ from transformers import AutoTokenizer, AutoModelForCausalLM
49
+
50
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
51
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")
52
+
53
+ input_text = "Question: How many hours in one day? Answer: "
54
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
55
+
56
+ outputs = model.generate(input_ids)
57
+ print(tokenizer.decode(outputs[0]))
58
+ ```
59
+
60
+ </details>
61
+
62
+ ### Running the model on a GPU
63
+
64
+ <details>
65
+ <summary> Click to expand </summary>
66
+
67
+ ```python
68
+ # pip install accelerate
69
+ from transformers import AutoTokenizer, AutoModelForCausalLM
70
+
71
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
72
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")
73
+
74
+ input_text = "Question: How many hours in one day? Answer: "
75
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
76
+
77
+ outputs = model.generate(input_ids)
78
+ print(tokenizer.decode(outputs[0]))
79
+ ```
80
+
81
+ </details>
82
+
83
+ ### Running the model on a GPU using `torch.compile`
84
+
85
+ <details>
86
+ <summary> Click to expand </summary>
87
+
88
+ ```python
89
+ import torch
90
+ from transformers import AutoTokenizer, AutoModelForCausalLM
91
+
92
+ tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
93
+ model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)
94
+
95
+ model = torch.compile(model)
96
+
97
+ input_text = "Question: How many hours in one day? Answer: "
98
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
99
+
100
+ outputs = model.generate(input_ids)
101
+ print(tokenizer.decode(outputs[0]))
102
+ ```
103
+
104
+ </details>
105
+
106
+
107
+ # Training Details
108
+
109
+ ## Training Data
110
+
111
+ Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.
112
+
113
+ ## Training Procedure
114
+
115
+ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
116
+
117
+ ### Training Hyperparameters
118
+
119
+ | **Hyperparameter** | **Value** | **Comment** |
120
+ |--------------------|------------|---------------------------------------|
121
+ | Precision | `bfloat16` | |
122
+ | Optimizer | AdamW | |
123
+ | Max learning rate | 6e-4 | Following a WSD (warmup-stable-decay) |
124
+ | | | learning rate scheduler |
125
+ | Weight decay | 1e-1 | |
126
+ | z-loss | 1e-4 | |
127
+ | Batch size | Variable | Batch size was gradually increased |
128
+ | | | during the training |
129
+
130
+ # Evaluation
131
+
132
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
133
+ <colgroup>
134
+ <col style="width: 10%;">
135
+ <col style="width: 10%;">
136
+ <col style="width: 7%;">
137
+ <col style="width: 7%;">
138
+ <col style="width: 7%;">
139
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
140
+ </colgroup>
141
+ <thead>
142
+ <tr>
143
+ <th>Category</th>
144
+ <th>Benchmark</th>
145
+ <th>Llama3.1-8B</th>
146
+ <th>Qwen2-7B</th>
147
+ <th>Qwen2.5-7B</th>
148
+ <th>Falcon3-7B-Base</th>
149
+ </tr>
150
+ </thead>
151
+ <tbody>
152
+ <tr>
153
+ <td rowspan="3">General</td>
154
+ <td>MMLU (5-shot)</td>
155
+ <td>65.2</td>
156
+ <td>70.4</td>
157
+ <td>74.2</td>
158
+ <td>67.5</td>
159
+ </tr>
160
+ <tr>
161
+ <td>MMLU-PRO (5-shot)</td>
162
+ <td>32.7</td>
163
+ <td>42.1</td>
164
+ <td>43.5</td>
165
+ <td>39.2</td>
166
+ </tr>
167
+ <tr>
168
+ <td>IFEval</td>
169
+ <td>12.0</td>
170
+ <td>30.6</td>
171
+ <td>33.9</td>
172
+ <td>34.3</td>
173
+ </tr>
174
+ <tr>
175
+ <td rowspan="2">Math</td>
176
+ <td>GSM8K (5-shot)</td>
177
+ <td>49.4</td>
178
+ <td>77.9</td>
179
+ <td>82.9</td>
180
+ <td>76.2</td>
181
+ </tr>
182
+ <tr>
183
+ <td>MATH(4-shot)</td>
184
+ <td>4.1</td>
185
+ <td>17.5</td>
186
+ <td>15.5</td>
187
+ <td>18.0</td>
188
+ </tr>
189
+ <tr>
190
+ <td rowspan="4">Reasoning</td>
191
+ <td>Arc Challenge (25-shot)</td>
192
+ <td>53.4</td>
193
+ <td>57.4</td>
194
+ <td>59.0</td>
195
+ <td>59.6</td>
196
+ </tr>
197
+ <tr>
198
+ <td>GPQA (0-shot)</td>
199
+ <td>31.0</td>
200
+ <td>31.9</td>
201
+ <td>33.0</td>
202
+ <td>35.5</td>
203
+ </tr>
204
+ <tr>
205
+ <td>MUSR (0-shot)</td>
206
+ <td>38.0</td>
207
+ <td>44.1</td>
208
+ <td>44.2</td>
209
+ <td>47.3</td>
210
+ </tr>
211
+ <tr>
212
+ <td>BBH (3-shot)</td>
213
+ <td>46.5</td>
214
+ <td>53.3</td>
215
+ <td>54.0</td>
216
+ <td>51.0</td>
217
+ </tr>
218
+ <tr>
219
+ <td rowspan="4">CommonSense Understanding</td>
220
+ <td>PIQA (0-shot)</td>
221
+ <td>80.3</td>
222
+ <td>79.8</td>
223
+ <td>78.7</td>
224
+ <td>77.7</td>
225
+ </tr>
226
+ <tr>
227
+ <td>SciQ (0-shot)</td>
228
+ <td>96.3</td>
229
+ <td>95.9</td>
230
+ <td>96.6</td>
231
+ <td>95.3</td>
232
+ </tr>
233
+ <tr>
234
+ <td>Winogrande (0-shot)</td>
235
+ <td>74.0</td>
236
+ <td>72.1</td>
237
+ <td>72.9</td>
238
+ <td>71.0</td>
239
+ </tr>
240
+ <tr>
241
+ <td>OpenbookQA (0-shot)</td>
242
+ <td>33.4</td>
243
+ <td>35.2</td>
244
+ <td>33.6</td>
245
+ <td>31.4</td>
246
+ </tr>
247
+ </tbody>
248
+ </table>
249
+
250
+
251
+
252
+ # Citation
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "config.json",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "eos_token_id": 11,
9
+ "head_dim": 256,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 2048,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 8192,
14
+ "max_position_embeddings": 32768,
15
+ "mlp_bias": false,
16
+ "model_type": "llama",
17
+ "num_attention_heads": 8,
18
+ "num_hidden_layers": 18,
19
+ "num_key_value_heads": 4,
20
+ "parallel_attn": false,
21
+ "pretraining_tp": 1,
22
+ "rms_norm_eps": 1e-06,
23
+ "rope_scaling": null,
24
+ "rope_theta": 1000042,
25
+ "tie_word_embeddings": false,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "4.46.1",
28
+ "use_cache": true,
29
+ "vocab_size": 131072
30
+ }
generation_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "eos_token_id": 11,
3
+ "transformers_version": "4.46.1"
4
+ }
model.safetensors.index.json ADDED
@@ -0,0 +1,172 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 6677635072
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00002-of-00002.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00002.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00002.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00002.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
98
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
101
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
103
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
108
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
110
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
113
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
114
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
115
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
116
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
122
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
126
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
127
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
129
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
131
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
132
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
133
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
134
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
135
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
136
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
137
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
138
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
139
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
140
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
142
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
149
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
151
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
154
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
156
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
158
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
161
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
162
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
163
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
164
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
165
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
166
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
170
+ "model.norm.weight": "model-00002-of-00002.safetensors"
171
+ }
172
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ ">>TITLE<<",
4
+ ">>ABSTRACT<<",
5
+ ">>INTRODUCTION<<",
6
+ ">>SUMMARY<<",
7
+ ">>COMMENT<<",
8
+ ">>ANSWER<<",
9
+ ">>QUESTION<<",
10
+ ">>DOMAIN<<",
11
+ ">>EMAIL_ADDRESS<<",
12
+ ">>IP_ADDRESS<<",
13
+ "<|startoftext|>",
14
+ ">>IP_ADDRESS_0<<",
15
+ ">>IP_ADDRESS_1<<",
16
+ ">>IP_ADDRESS_2<<",
17
+ ">>IP_ADDRESS_3<<",
18
+ ">>IP_ADDRESS_4<<",
19
+ ">>IP_ADDRESS_5<<",
20
+ ">>IP_ADDRESS_6<<",
21
+ ">>IP_ADDRESS_7<<",
22
+ ">>IP_ADDRESS_8<<",
23
+ ">>IP_ADDRESS_9<<",
24
+ ">>PASSWORD<<",
25
+ ">>KEY<<"
26
+ ],
27
+ "eos_token": {
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff