rippertnt commited on
Commit
5f3dd05
1 Parent(s): 7bd4226

Upload 20 files

Browse files
LICENSE ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EXAONE AI Model License Agreement 1.0 - NC
2
+
3
+ This License Agreement (“Agreement”) is entered into between you (“Licensee”) and LG Management Development
4
+ Institute Co., Ltd. (“Licensor”), governing the use of the EXAONE AI Model (“Model”). By downloading,
5
+ installing, copying, or using the Model, you agree to comply with and be bound by the terms of this Agreement.
6
+ If you do not agree to all the terms, you must not download, install, copy, or use the Model. This Agreement
7
+ constitutes a binding legal agreement between the Licensee and Licensor.
8
+
9
+ 1. Definitions
10
+ 1.1 Model: The artificial intelligence model provided by Licensor, which includes any software,
11
+ algorithms, machine learning models, or related components supplied by Licensor. This definition extends
12
+ to encompass all updates, enhancements, improvements, bug fixes, patches, or other modifications that
13
+ may be provided by Licensor from time to time, whether automatically or manually implemented.
14
+ 1.2 Derivatives: Any modifications, alterations, enhancements, improvements, adaptations, or derivative
15
+ works of the Model created by Licensee or any third party. This includes changes made to the Model's
16
+ architecture, parameters, data processing methods, or any other aspect of the Model that results in a
17
+ modification of its functionality or output.
18
+ 1.3 Output: Any data, results, content, predictions, analyses, insights, or other materials generated by
19
+ the Model or Derivatives, regardless of whether they are in their original form or have been further
20
+ processed or modified by the Licensee. This includes, but is not limited to, textual or numerical produced
21
+ directly or indirectly through the use of the Model.
22
+ 1.4 Licensor: LG Management Development Institute Co., Ltd., the owner, developer, and provider of the
23
+ EXAONE AI Model. The Licensor holds all rights, title, and interest in the Model and is responsible for
24
+ granting licenses to use the Model under the terms specified in this Agreement.
25
+ 1.5 Licensee: The individual, organization, corporation, academic institution, government agency, or other
26
+ entity using or intending to use the Model under the terms and conditions of this Agreement. The Licensee
27
+ is responsible for ensuring compliance with the Agreement by all authorized users who access or utilize
28
+ the Model on behalf of the Licensee.
29
+
30
+ 2. License Grant
31
+ 2.1 Grant of License: Subject to the terms and conditions outlined in this Agreement, the Licensor hereby
32
+ grants the Licensee a limited, non-exclusive, non-transferable, worldwide, and revocable license to:
33
+ a. Access, download, install, and use the Model solely for internal research purposes. This includes
34
+ evaluation, testing, academic research, experimentation, and participation in competitions, provided
35
+ that such participation is in a non-commercial context. Notwithstanding Section 3.1, the Licensee may
36
+ only provide the Model or Derivatives for a competition if no license is granted to the competition
37
+ organizer or any third party.
38
+ b. Publicly disclose research results and findings derived from the use of the Model or Derivatives,
39
+ including publishing papers or presentations, but may not distribute, share, or otherwise make the
40
+ Model or any Derivatives available to third parties. When publishing or presenting, only a minimal
41
+ amount of Output necessary for disclosing research results should be made public.
42
+ c. Modify the Model and create Derivatives based on the Model, provided that such modifications and
43
+ Derivatives are used exclusively for internal research purposes. The Licensee may conduct experiments,
44
+ perform analyses, and apply custom modifications to the Model to explore its capabilities and
45
+ performance under various scenarios.
46
+ 2.2 Scope of License: The license granted herein does not authorize the Licensee to use the Model for any
47
+ purpose not explicitly permitted under this Agreement. Any use beyond the scope of this license, including
48
+ any commercial application or external distribution, is strictly prohibited unless explicitly agreed upon
49
+ in writing by the Licensor.
50
+
51
+ 3. Restrictions
52
+ 3.1 Sublicensing and Distribution: The Licensee shall not sublicense, sell, rent, lease, distribute,
53
+ disclose, or otherwise transfer the Model, any Derivatives, or any Output to any third party, except as
54
+ permitted in Section 2.1. This prohibition extends to making the Model available as part of a service,
55
+ product, or solution offered to others, whether for free or for compensation.
56
+ 3.2 Commercial Use: The Licensee is expressly prohibited from using the Model, Derivatives, or Output for
57
+ any commercial purposes, including but not limited to, developing or deploying products, services, or
58
+ applications that generate revenue, whether directly or indirectly. Any commercial exploitation of the
59
+ Model or its derivatives requires a separate commercial license agreement with the Licensor. Furthermore,
60
+ the Licensee shall not use the Model, Derivatives or Output to develop or improve other models.
61
+ 3.3 Reverse Engineering: The Licensee shall not decompile, disassemble, reverse engineer, or attempt to
62
+ derive the source code, underlying ideas, algorithms, or structure of the Model, except to the extent that
63
+ such activities are expressly permitted by applicable law. Any attempt to bypass or circumvent
64
+ technological protection measures applied to the Model is strictly prohibited.
65
+ 3.4 Unlawful Use: The Licensee shall not use the Model and Derivatives for any illegal, fraudulent, or
66
+ unauthorized activities, nor for any purpose that violates applicable laws or regulations. This includes
67
+ but is not limited to the creation, distribution, or dissemination of malicious, deceptive, or unlawful
68
+ content.
69
+ 3.5 Ethical Use: The Licensee shall ensure that the Model or Derivatives is used in an ethical and
70
+ responsible manner, adhering to the following guidelines:
71
+ a. The Model and Derivatives shall not be used to generate, propagate, or amplify false, misleading,
72
+ or harmful information, including fake news, misinformation, or disinformation.
73
+ b. The Model and Derivatives shall not be employed to create, distribute, or promote content that is
74
+ discriminatory, harassing, defamatory, abusive, or otherwise offensive to individuals or groups based
75
+ on race, gender, sexual orientation, religion, nationality, or other protected characteristics.
76
+ c. The Model and Derivatives shall not infringe on the rights of others, including intellectual
77
+ property rights, privacy rights, or any other rights recognized by law. The Licensee shall obtain all
78
+ necessary permissions and consents before using the Model and Derivatives in a manner that may impact
79
+ the rights of third parties.
80
+ d. The Model and Derivatives shall not be used in a way that causes harm, whether physical, mental,
81
+ emotional, or financial, to individuals, organizations, or communities. The Licensee shall take all
82
+ reasonable measures to prevent misuse or abuse of the Model and Derivatives that could result in harm
83
+ or injury.
84
+
85
+ 4. Ownership
86
+ 4.1 Intellectual Property: All rights, title, and interest in and to the Model, including any
87
+ modifications, Derivatives, and associated documentation, are and shall remain the exclusive property of
88
+ the Licensor. The Licensee acknowledges that this Agreement does not transfer any ownership rights to the
89
+ Licensee. All trademarks, service marks, and logos associated with the Model are the property of the
90
+ Licensor.
91
+ 4.2 Output: All rights, title, and interest in and to the Output generated by the Model, whether in its
92
+ original form or modified, are and shall remain the exclusive property of the Licensor. The Licensee shall
93
+ not claim ownership of the Output except as expressly provided in this Agreement. The Licensee may use the
94
+ Output solely for the purposes permitted under this Agreement and shall not exploit the Output for
95
+ unauthorized or commercial purposes.
96
+ 4.3 Attribution: In any publication or presentation of results obtained using the Model, the Licensee
97
+ shall provide appropriate attribution to the Licensor, citing the Model's name and version, along with any
98
+ relevant documentation or references specified by the Licensor.
99
+
100
+ 5. No Warranty
101
+ 5.1 “As-Is” Basis: The Model, Derivatives, and Output are provided on an “as-is” and “as-available” basis,
102
+ without any warranties or representations of any kind, whether express, implied, or statutory. The
103
+ Licensor disclaims all warranties, including but not limited to, implied warranties of merchantability,
104
+ fitness for a particular purpose, accuracy, reliability, non-infringement, or any warranty arising from
105
+ the course of dealing or usage of trade.
106
+ 5.2 Performance and Reliability: The Licensor does not warrant or guarantee that the Model, Derivatives or
107
+ Output will meet the Licensee’s requirements, that the operation of the Model, Derivatives or Output will
108
+ be uninterrupted or error-free, or that defects in the Model will be corrected. The Licensee acknowledges
109
+ that the use of the Model, Derivatives or Output is at its own risk and that the Model, Derivatives or
110
+ Output may contain bugs, errors, or other limitations.
111
+ 5.3 No Endorsement: The Licensor does not endorse, approve, or certify any results, conclusions, or
112
+ recommendations derived from the use of the Model. The Licensee is solely responsible for evaluating the
113
+ accuracy, reliability, and suitability of the Model for its intended purposes.
114
+
115
+ 6. Limitation of Liability
116
+ 6.1 No Liability for Damages: To the fullest extent permitted by applicable law, in no event shall the
117
+ Licensor be liable for any special, incidental, indirect, consequential, exemplary, or punitive damages,
118
+ including but not limited to, damages for loss of business profits, business interruption, loss of
119
+ business information, loss of data, or any other pecuniary or non-pecuniary loss arising out of or in
120
+ connection with the use or inability to use the Model, Derivatives or any Output, even if the Licensor has
121
+ been advised of the possibility of such damages.
122
+ 6.2 Indemnification: The Licensee agrees to indemnify, defend, and hold harmless the Licensor, its
123
+ affiliates, officers, directors, employees, and agents from and against any claims, liabilities, damages,
124
+ losses, costs, or expenses (including reasonable attorneys' fees) arising out of or related to the
125
+ Licensee's use of the Model, any Derivatives, or any Output, including any violation of this Agreement or
126
+ applicable laws.
127
+
128
+ 7. Termination
129
+ 7.1 Termination by Licensor: The Licensor reserves the right to terminate this Agreement and revoke the
130
+ Licensee’s rights to use the Model at any time, with or without cause, and without prior notice if the
131
+ Licensee breaches any of the terms or conditions of this Agreement. Termination shall be effective
132
+ immediately upon notice.
133
+ 7.2 Effect of Termination: Upon termination of this Agreement, the Licensee must immediately cease all use
134
+ of the Model, Derivatives, and Output and destroy all copies of the Model, Derivatives, and Output in its
135
+ possession or control, including any backup or archival copies. The Licensee shall certify in writing to
136
+ the Licensor that such destruction has been completed.
137
+ 7.3 Survival: The provisions of this Agreement that by their nature should survive termination, including
138
+ but not limited to, Sections 4 (Ownership), 5 (No Warranty), 6 (Limitation of Liability), and this
139
+ Section 7 (Termination), shall continue to apply after termination.
140
+
141
+ 8. Governing Law
142
+ 8.1 Governing Law: This Agreement shall be governed by and construed in accordance with the laws of the
143
+ Republic of Korea, without regard to its conflict of laws principles.
144
+ 8.2 Arbitration: Any disputes, controversies, or claims arising out of or relating to this Agreement,
145
+ including its existence, validity, interpretation, performance, breach, or termination, shall be referred
146
+ to and finally resolved by arbitration administered by the Korean Commercial Arbitration Board (KCAB) in
147
+ accordance with the International Arbitration Rules of the Korean Commercial Arbitration Board in force at
148
+ the time of the commencement of the arbitration. The seat of arbitration shall be Seoul, Republic of
149
+ Korea. The tribunal shall consist of one arbitrator. The language of the arbitration shall be English.
150
+
151
+ 9. Alterations
152
+ 9.1 Modifications: The Licensor reserves the right to modify or amend this Agreement at any time, in its
153
+ sole discretion. Any modifications will be effective upon posting the updated Agreement on the Licensor’s
154
+ website or through other means of communication. The Licensee is responsible for reviewing the Agreement
155
+ periodically for changes. Continued use of the Model after any modifications have been made constitutes
156
+ acceptance of the revised Agreement.
157
+ 9.2 Entire Agreement: This Agreement constitutes the entire agreement between the Licensee and Licensor
158
+ concerning the subject matter hereof and supersedes all prior or contemporaneous oral or written
159
+ agreements, representations, or understandings. Any terms or conditions of any purchase order or other
160
+ document submitted by the Licensee in connection with the Model that are in addition to, different from,
161
+ or inconsistent with the terms and conditions of this Agreement are not binding on the Licensor and are
162
+ void.
163
+
164
+ By downloading, installing, or using the EXAONE AI Model, the Licensee acknowledges that it has read,
165
+ understood, and agrees to be bound by the terms and conditions of this Agreement.
README.md CHANGED
@@ -1,3 +1,106 @@
1
  ---
2
- license: gpl-3.0
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: other
3
+ license_name: exaone
4
+ license_link: LICENSE
5
+ language:
6
+ - en
7
+ - ko
8
+ tags:
9
+ - lg-ai
10
+ - exaone
11
  ---
12
+
13
+ <p align="center">
14
+ <img src="assets/EXAONE_Symbol+BI_3d.png", width="300", style="margin: 40 auto;">
15
+ <br>
16
+
17
+ # EXAONE-3.0-7.8B-Instruct
18
+
19
+ ## Introduction
20
+
21
+ We introduce EXAONE-3.0-7.8B-Instruct, a pre-trained and instruction-tuned bilingual (English and Korean) generative model with 7.8 billion parameters.
22
+ The model was pre-trained with 8T curated tokens and post-trained with supervised fine-tuning and direct preference optimization.
23
+ It demonstrates highly competitive benchmark performance against other state-of-the-art open models of similar size.
24
+
25
+ For more details, please refer to our [technical report](https://www.lgresearch.ai/data/upload/tech_report/en/EXAONE_3.0_Technical_Report.pdf), [blog](https://www.lgresearch.ai/blog/view?seq=460) and [GitHub](https://github.com/LG-AI-EXAONE).
26
+
27
+ ## Quickstart
28
+
29
+ We recommend to use transformers v4.41 or later.
30
+
31
+ ```python
32
+ import torch
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+
35
+ model = AutoModelForCausalLM.from_pretrained(
36
+ "LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
37
+ torch_dtype=torch.bfloat16,
38
+ trust_remote_code=True,
39
+ device_map="auto"
40
+ )
41
+ tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")
42
+
43
+ # Choose your prompt
44
+ prompt = "Explain who you are" # English example
45
+ prompt = "너의 소원을 말해봐" # Korean example
46
+
47
+ messages = [
48
+ {"role": "system",
49
+ "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
50
+ {"role": "user", "content": prompt}
51
+ ]
52
+ input_ids = tokenizer.apply_chat_template(
53
+ messages,
54
+ tokenize=True,
55
+ add_generation_prompt=True,
56
+ return_tensors="pt"
57
+ )
58
+
59
+ output = model.generate(
60
+ input_ids.to("cuda"),
61
+ eos_token_id=tokenizer.eos_token_id,
62
+ max_new_tokens=128
63
+ )
64
+ print(tokenizer.decode(output[0]))
65
+ ```
66
+
67
+ > ### Note
68
+ > The EXAONE 3.0 instruction-tuned language model was trained to utilize the system prompt,
69
+ > so we highly recommend using the system prompts provided in the code snippet above.
70
+
71
+ ## Evaluation
72
+
73
+ We compared EXAONE-3.0-7.8B-Instruct with similar-sized instruction-tuned LLMs. To verify the performance of real-world use cases, we measured benchmarks that have a high correlation with [LMSYS Chatbot Arena](https://chat.lmsys.org/).
74
+ Some experimental results are shown below. The full evaluation results can be found in the [technical report](https://www.lgresearch.ai/data/upload/tech_report/en/EXAONE_3.0_Technical_Report.pdf).
75
+
76
+ | Language | Benchmark | EXAONE 3.0 <br>7.8B Inst. | Llama 3.1 <br>8B Inst. | Gemma 2 <br>9B Inst. | QWEN 2 <br>7B Inst. | Phi 3 <br>7B Inst. | Mistral 7B <br>Inst. |
77
+ | :-----: | :----- | :-----: | :-----: | :-----: | :-----: | :-----: | :-----: |
78
+ | English | MT-Bench | **9.01** | 7.95 | 8.52 | 8.41 | 8.52 | 7.72 |
79
+ | | Arena-Hard-v0.1 | **46.8** | 28.0 | 42.1 | 21.7 | 29.1 | 16.2 |
80
+ | | WildBench | **48.2** | 34.5 | 41.5 | 34.9 | 32.8 | 29.0 |
81
+ | | AlpacaEval 2.0 LC | 45.0 | 31.5 | **47.5** | 24.5 | 37.1 | 31.0 |
82
+ | Korean | KoMT-Bench<sup>[1] | **8.92** | 6.06 | 7.92 | 7.69 | 4.87 | 5.20 |
83
+ | | LogicKor | **8.62** | 5.40 | 8.07 | 6.12 | 3.76 | 3.42 |
84
+
85
+ - [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see [README](https://github.com/LG-AI-EXAONE/KoMT-Bench) for more details.
86
+
87
+ ## Limitation
88
+
89
+ The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflects the views of LG AI Research.
90
+
91
+ - Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
92
+ - Biased responses may be generated, which are associated with age, gender, race, and so on.
93
+ - The generated responses rely heavily on statistics from the training data, which can result in the generation of
94
+ semantically or syntactically incorrect sentences.
95
+ - Since the model does not reflect the latest information, the responses may be false or contradictory.
96
+
97
+ LG AI Research strives to reduce potential risks that may arise from EXAONE language model. Users are not allowed
98
+ to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate
99
+ outputs violating LG AI’s ethical principles when using EXAONE language model.
100
+
101
+ ## License
102
+
103
+ The model is licensed under [EXAONE AI Model License Agreement 1.0 - NC](./LICENSE)
104
+
105
+ ## Contact
106
+ LG AI Research Technical Support: [email protected]
assets/EXAONE_Symbol+BI_3d.png ADDED
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_function": "silu",
3
+ "architectures": [
4
+ "ExaoneForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_exaone.ExaoneConfig",
9
+ "AutoModelForCausalLM": "modeling_exaone.ExaoneForCausalLM",
10
+ "AutoModelForSequenceClassification": "modeling_exaone.ExaoneForSequenceClassification"
11
+ },
12
+ "bos_token_id": 1,
13
+ "embed_dropout": 0.0,
14
+ "eos_token_id": 361,
15
+ "hidden_size": 4096,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 14336,
18
+ "layer_norm_epsilon": 1e-05,
19
+ "max_position_embeddings": 4096,
20
+ "model_type": "exaone",
21
+ "num_attention_heads": 32,
22
+ "num_key_value_heads": 8,
23
+ "num_layers": 32,
24
+ "pad_token_id": 0,
25
+ "rope_scaling": null,
26
+ "rope_theta": 500000.0,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "float32",
29
+ "transformers_version": "4.41.0",
30
+ "use_cache": true,
31
+ "vocab_size": 102400
32
+ }
configuration_exaone.py ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2021 The LG AI Research EXAONE Lab. All rights reserved.
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+ """ EXAONE model configuration """
16
+ from transformers.configuration_utils import PretrainedConfig
17
+ from transformers.utils import logging
18
+
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+ EXAONE_PRETRAINED_CONFIG_ARCHIVE_MAP = {
23
+ }
24
+
25
+
26
+ class ExaoneConfig(PretrainedConfig):
27
+ r"""
28
+ This is the configuration class to store the configuration of a :class:`~transformers.ExaoneModel`. It is used to
29
+ instantiate a EXAONE model according to the specified arguments, defining the model architecture. Instantiating a
30
+ configuration with the defaults will yield a similar configuration to that of the Exaone
31
+
32
+ Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
33
+ outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
34
+
35
+
36
+ Args:
37
+ vocab_size (:obj:`int`, `optional`, defaults to 102400):
38
+ Vocabulary size of the EXAONE model. Defines the number of different tokens that can be represented by the
39
+ :obj:`inputs_ids` passed when calling :class:`~transformers.ExaoneModel`. Vocabulary size of the model.
40
+ Defines the different tokens that can be represented by the `inputs_ids` passed to the forward method of
41
+ :class:`~transformers.EXAONEModel`.
42
+ max_position_embeddings (:obj:`int`, `optional`, defaults to 2048):
43
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
44
+ just in case (e.g., 512 or 1024 or 2048).
45
+ hidden_size (:obj:`int`, `optional`, defaults to 2048):
46
+ Dimensionality of the encoder layers and the pooler layer.
47
+ num_layers (:obj:`int`, `optional`, defaults to 32):
48
+ Number of hidden layers in the Transformer encoder.
49
+ num_attention_heads (:obj:`int`, `optional`, defaults to 32):
50
+ Number of attention heads for each attention layer in the Transformer decoder.
51
+ num_key_value_heads (:obj:`int`, `optional`):
52
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
53
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
54
+ `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
55
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
56
+ by meanpooling all the original heads within that group. For more details checkout [this
57
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
58
+ `num_attention_heads`.
59
+ intermediate_size (:obj:`int`, `optional`, defaults to `hidden_size * 4`):
60
+ Dimensionality of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
61
+ activation_function (:obj:`str` or :obj:`function`, `optional`, defaults to :obj:`"silu"`):
62
+ The non-linear activation function (function or string) in the decoder.
63
+ rope_theta (:obj:`float`, `optional`, defaults to 10000.0):
64
+ The base period of the RoPE embeddings.
65
+ rope_scaling (:obj:`Dict`, `optional`):
66
+ Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
67
+ and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
68
+ accordingly.
69
+ Expected contents:
70
+ `rope_type` (:obj:`str`):
71
+ The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
72
+ 'llama3'], with 'default' being the original RoPE implementation.
73
+ `factor` (:obj:`float`, `optional`):
74
+ Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
75
+ most scaling types, a `factor` of x will enable the model to handle sequences of length x *
76
+ original maximum pre-trained length.
77
+ `original_max_position_embeddings` (:obj:`int`, `optional`):
78
+ Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
79
+ pretraining.
80
+ `attention_factor` (:obj:`float`, `optional`):
81
+ Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
82
+ computation. If unspecified, it defaults to value recommended by the implementation, using the
83
+ `factor` field to infer the suggested value.
84
+ `beta_fast` (:obj:`float`, `optional`):
85
+ Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
86
+ ramp function. If unspecified, it defaults to 32.
87
+ `beta_slow` (:obj:`float`, `optional`):
88
+ Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
89
+ ramp function. If unspecified, it defaults to 1.
90
+ `short_factor` (:obj:`List[float]`, `optional`):
91
+ Only used with 'longrope'. The scaling factor to be applied to short contexts (<
92
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
93
+ size divided by the number of attention heads divided by 2
94
+ `long_factor` (:obj:`List[float]`, `optional`):
95
+ Only used with 'longrope'. The scaling factor to be applied to long contexts (<
96
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
97
+ size divided by the number of attention heads divided by 2
98
+ `low_freq_factor` (:obj:`float`, `optional`):
99
+ Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
100
+ `high_freq_factor` (:obj:`float`, `optional`):
101
+ Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
102
+ embed_dropout (:obj:`float`, `optional`, defaults to 0.0):
103
+ The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
104
+ attention_dropout (:obj:`float`, `optional`, defaults to 0.0):
105
+ The dropout ratio for the attention probabilities.
106
+ layer_norm_epsilon (:obj:`float`, `optional`, defaults to 1e-5):
107
+ The epsilon used by the layer normalization layers.
108
+ initializer_range (:obj:`float`, `optional`, defaults to 0.02):
109
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
110
+ use_cache (:obj:`bool`, `optional`, defaults to :obj:`True`):
111
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
112
+ relevant if ``config.is_decoder=True``.
113
+ bos_token_id (:obj:`int`, `optional`, defaults to 0):
114
+ Beginning of stream token id.
115
+ eos_token_id (:obj:`int`, `optional`, defaults to 2):
116
+ End of stream token id.
117
+ tie_word_embeddings (:obj:`bool`, `optional`, defaults to :obj:`True`):
118
+ Whether to tie weight embeddings
119
+ gradient_checkpointing (:obj:`bool`, `optional`, defaults to :obj:`False`):
120
+ If True, use gradient checkpointing to save memory at the expense of slower backward pass.
121
+
122
+ Example::
123
+
124
+ >>> from transformers import EXAONEModel, ExaoneConfig
125
+
126
+ >>> # Initializing a EXAONE configuration
127
+ >>> configuration = ExaoneConfig()
128
+
129
+ >>> # Initializing a model from configuration
130
+ >>> model = EXAONEModel(configuration)
131
+
132
+ >>> # Accessing the model configuration
133
+ >>> configuration = model.config
134
+ """
135
+ model_type = "exaone"
136
+ keys_to_ignore_at_inference = ["past_key_values"]
137
+ attribute_map = {"num_hidden_layers": "num_layers"}
138
+
139
+ def __init__(
140
+ self,
141
+ vocab_size=102400,
142
+ max_position_embeddings=2048,
143
+ hidden_size=2048,
144
+ num_layers=32,
145
+ num_attention_heads=32,
146
+ num_key_value_heads=None,
147
+ intermediate_size=None,
148
+ activation_function="silu",
149
+ rope_theta=10000.0,
150
+ rope_scaling=None,
151
+ embed_dropout=0.0,
152
+ attention_dropout=0.0,
153
+ layer_norm_epsilon=1e-5,
154
+ initializer_range=0.02,
155
+ use_cache=True,
156
+ bos_token_id=0,
157
+ eos_token_id=2,
158
+ tie_word_embeddings=True,
159
+ **kwargs
160
+ ):
161
+ self.vocab_size = vocab_size
162
+ self.max_position_embeddings = max_position_embeddings
163
+ self.hidden_size = hidden_size
164
+ self.num_layers = num_layers
165
+ self.num_attention_heads = num_attention_heads
166
+ self.num_hidden_layers = num_layers
167
+ if num_key_value_heads is None:
168
+ num_key_value_heads = num_attention_heads
169
+ self.num_key_value_heads = num_key_value_heads
170
+ if intermediate_size:
171
+ self.intermediate_size = intermediate_size
172
+ else:
173
+ self.intermediate_size = hidden_size * 4
174
+ self.activation_function = activation_function
175
+ self.embed_dropout = embed_dropout
176
+ self.attention_dropout = attention_dropout
177
+ self.layer_norm_epsilon = layer_norm_epsilon
178
+ self.initializer_range = initializer_range
179
+ self.use_cache = use_cache
180
+ self.rope_theta = rope_theta
181
+ self.rope_scaling = rope_scaling
182
+
183
+ self.bos_token_id = bos_token_id
184
+ self.eos_token_id = eos_token_id
185
+
186
+ super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, tie_word_embeddings=tie_word_embeddings, **kwargs)
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 361,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.41"
7
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ff7294cfa7e3d98e2f77d279810da0f278e184b17c2b61bbd512505a813e033
3
+ size 637669384
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35d444d89c4dfb9980b0a52b7efb7bc129ac7fb7c87c2d8b029537e08c1aae14
3
+ size 704845744
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:351db644d09a57da9ab1a487f0bd08f10976560f123b702c488c0f19fec43ab1
3
+ size 704845784
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2f02fcdb88441c69f97b77a15e088b04b649e2f9c0ecef89be7de011ba74e18
3
+ size 537040168
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c833d9dcaa39c043374dea0adbfa0483c4caafb32ff47ecb0a21825e102743bf
3
+ size 704845792
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:695c62fb4623075ec3022a0117b63d0f581ac7b9b28f7fac0fe17149f375e84c
3
+ size 537056648
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a9c431b6dce9f9f6283e28ef88d230aa1bedcb50f76cfa8146435b879e12fac0
3
+ size 1677721728
model.safetensors.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 31273795584
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00007-of-00007.safetensors",
7
+ "transformer.h.0.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
8
+ "transformer.h.0.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
9
+ "transformer.h.0.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
10
+ "transformer.h.0.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
11
+ "transformer.h.0.ln_1.weight": "model-00001-of-00007.safetensors",
12
+ "transformer.h.0.ln_2.weight": "model-00001-of-00007.safetensors",
13
+ "transformer.h.0.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
14
+ "transformer.h.0.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
15
+ "transformer.h.0.mlp.c_proj.weight": "model-00001-of-00007.safetensors",
16
+ "transformer.h.1.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
17
+ "transformer.h.1.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
18
+ "transformer.h.1.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
19
+ "transformer.h.1.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
20
+ "transformer.h.1.ln_1.weight": "model-00001-of-00007.safetensors",
21
+ "transformer.h.1.ln_2.weight": "model-00001-of-00007.safetensors",
22
+ "transformer.h.1.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
23
+ "transformer.h.1.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
24
+ "transformer.h.1.mlp.c_proj.weight": "model-00001-of-00007.safetensors",
25
+ "transformer.h.10.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
26
+ "transformer.h.10.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
27
+ "transformer.h.10.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
28
+ "transformer.h.10.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
29
+ "transformer.h.10.ln_1.weight": "model-00003-of-00007.safetensors",
30
+ "transformer.h.10.ln_2.weight": "model-00003-of-00007.safetensors",
31
+ "transformer.h.10.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
32
+ "transformer.h.10.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
33
+ "transformer.h.10.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
34
+ "transformer.h.11.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
35
+ "transformer.h.11.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
36
+ "transformer.h.11.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
37
+ "transformer.h.11.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
38
+ "transformer.h.11.ln_1.weight": "model-00003-of-00007.safetensors",
39
+ "transformer.h.11.ln_2.weight": "model-00003-of-00007.safetensors",
40
+ "transformer.h.11.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
41
+ "transformer.h.11.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
42
+ "transformer.h.11.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
43
+ "transformer.h.12.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
44
+ "transformer.h.12.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
45
+ "transformer.h.12.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
46
+ "transformer.h.12.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
47
+ "transformer.h.12.ln_1.weight": "model-00003-of-00007.safetensors",
48
+ "transformer.h.12.ln_2.weight": "model-00003-of-00007.safetensors",
49
+ "transformer.h.12.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
50
+ "transformer.h.12.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
51
+ "transformer.h.12.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
52
+ "transformer.h.13.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
53
+ "transformer.h.13.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
54
+ "transformer.h.13.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
55
+ "transformer.h.13.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
56
+ "transformer.h.13.ln_1.weight": "model-00003-of-00007.safetensors",
57
+ "transformer.h.13.ln_2.weight": "model-00003-of-00007.safetensors",
58
+ "transformer.h.13.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
59
+ "transformer.h.13.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
60
+ "transformer.h.13.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
61
+ "transformer.h.14.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
62
+ "transformer.h.14.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
63
+ "transformer.h.14.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
64
+ "transformer.h.14.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
65
+ "transformer.h.14.ln_1.weight": "model-00003-of-00007.safetensors",
66
+ "transformer.h.14.ln_2.weight": "model-00003-of-00007.safetensors",
67
+ "transformer.h.14.mlp.c_fc_0.weight": "model-00003-of-00007.safetensors",
68
+ "transformer.h.14.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
69
+ "transformer.h.14.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
70
+ "transformer.h.15.attn.attention.k_proj.weight": "model-00003-of-00007.safetensors",
71
+ "transformer.h.15.attn.attention.out_proj.weight": "model-00003-of-00007.safetensors",
72
+ "transformer.h.15.attn.attention.q_proj.weight": "model-00003-of-00007.safetensors",
73
+ "transformer.h.15.attn.attention.v_proj.weight": "model-00003-of-00007.safetensors",
74
+ "transformer.h.15.ln_1.weight": "model-00003-of-00007.safetensors",
75
+ "transformer.h.15.ln_2.weight": "model-00003-of-00007.safetensors",
76
+ "transformer.h.15.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
77
+ "transformer.h.15.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
78
+ "transformer.h.15.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
79
+ "transformer.h.16.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
80
+ "transformer.h.16.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
81
+ "transformer.h.16.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
82
+ "transformer.h.16.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
83
+ "transformer.h.16.ln_1.weight": "model-00004-of-00007.safetensors",
84
+ "transformer.h.16.ln_2.weight": "model-00004-of-00007.safetensors",
85
+ "transformer.h.16.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
86
+ "transformer.h.16.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
87
+ "transformer.h.16.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
88
+ "transformer.h.17.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
89
+ "transformer.h.17.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
90
+ "transformer.h.17.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
91
+ "transformer.h.17.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
92
+ "transformer.h.17.ln_1.weight": "model-00004-of-00007.safetensors",
93
+ "transformer.h.17.ln_2.weight": "model-00004-of-00007.safetensors",
94
+ "transformer.h.17.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
95
+ "transformer.h.17.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
96
+ "transformer.h.17.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
97
+ "transformer.h.18.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
98
+ "transformer.h.18.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
99
+ "transformer.h.18.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
100
+ "transformer.h.18.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
101
+ "transformer.h.18.ln_1.weight": "model-00004-of-00007.safetensors",
102
+ "transformer.h.18.ln_2.weight": "model-00004-of-00007.safetensors",
103
+ "transformer.h.18.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
104
+ "transformer.h.18.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
105
+ "transformer.h.18.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
106
+ "transformer.h.19.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
107
+ "transformer.h.19.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
108
+ "transformer.h.19.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
109
+ "transformer.h.19.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
110
+ "transformer.h.19.ln_1.weight": "model-00004-of-00007.safetensors",
111
+ "transformer.h.19.ln_2.weight": "model-00004-of-00007.safetensors",
112
+ "transformer.h.19.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
113
+ "transformer.h.19.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
114
+ "transformer.h.19.mlp.c_proj.weight": "model-00004-of-00007.safetensors",
115
+ "transformer.h.2.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
116
+ "transformer.h.2.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
117
+ "transformer.h.2.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
118
+ "transformer.h.2.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
119
+ "transformer.h.2.ln_1.weight": "model-00001-of-00007.safetensors",
120
+ "transformer.h.2.ln_2.weight": "model-00001-of-00007.safetensors",
121
+ "transformer.h.2.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
122
+ "transformer.h.2.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
123
+ "transformer.h.2.mlp.c_proj.weight": "model-00001-of-00007.safetensors",
124
+ "transformer.h.20.attn.attention.k_proj.weight": "model-00004-of-00007.safetensors",
125
+ "transformer.h.20.attn.attention.out_proj.weight": "model-00004-of-00007.safetensors",
126
+ "transformer.h.20.attn.attention.q_proj.weight": "model-00004-of-00007.safetensors",
127
+ "transformer.h.20.attn.attention.v_proj.weight": "model-00004-of-00007.safetensors",
128
+ "transformer.h.20.ln_1.weight": "model-00004-of-00007.safetensors",
129
+ "transformer.h.20.ln_2.weight": "model-00004-of-00007.safetensors",
130
+ "transformer.h.20.mlp.c_fc_0.weight": "model-00004-of-00007.safetensors",
131
+ "transformer.h.20.mlp.c_fc_1.weight": "model-00004-of-00007.safetensors",
132
+ "transformer.h.20.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
133
+ "transformer.h.21.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
134
+ "transformer.h.21.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
135
+ "transformer.h.21.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
136
+ "transformer.h.21.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
137
+ "transformer.h.21.ln_1.weight": "model-00005-of-00007.safetensors",
138
+ "transformer.h.21.ln_2.weight": "model-00005-of-00007.safetensors",
139
+ "transformer.h.21.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
140
+ "transformer.h.21.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
141
+ "transformer.h.21.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
142
+ "transformer.h.22.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
143
+ "transformer.h.22.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
144
+ "transformer.h.22.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
145
+ "transformer.h.22.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
146
+ "transformer.h.22.ln_1.weight": "model-00005-of-00007.safetensors",
147
+ "transformer.h.22.ln_2.weight": "model-00005-of-00007.safetensors",
148
+ "transformer.h.22.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
149
+ "transformer.h.22.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
150
+ "transformer.h.22.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
151
+ "transformer.h.23.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
152
+ "transformer.h.23.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
153
+ "transformer.h.23.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
154
+ "transformer.h.23.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
155
+ "transformer.h.23.ln_1.weight": "model-00005-of-00007.safetensors",
156
+ "transformer.h.23.ln_2.weight": "model-00005-of-00007.safetensors",
157
+ "transformer.h.23.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
158
+ "transformer.h.23.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
159
+ "transformer.h.23.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
160
+ "transformer.h.24.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
161
+ "transformer.h.24.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
162
+ "transformer.h.24.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
163
+ "transformer.h.24.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
164
+ "transformer.h.24.ln_1.weight": "model-00005-of-00007.safetensors",
165
+ "transformer.h.24.ln_2.weight": "model-00005-of-00007.safetensors",
166
+ "transformer.h.24.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
167
+ "transformer.h.24.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
168
+ "transformer.h.24.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
169
+ "transformer.h.25.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
170
+ "transformer.h.25.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
171
+ "transformer.h.25.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
172
+ "transformer.h.25.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
173
+ "transformer.h.25.ln_1.weight": "model-00005-of-00007.safetensors",
174
+ "transformer.h.25.ln_2.weight": "model-00005-of-00007.safetensors",
175
+ "transformer.h.25.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
176
+ "transformer.h.25.mlp.c_fc_1.weight": "model-00005-of-00007.safetensors",
177
+ "transformer.h.25.mlp.c_proj.weight": "model-00005-of-00007.safetensors",
178
+ "transformer.h.26.attn.attention.k_proj.weight": "model-00005-of-00007.safetensors",
179
+ "transformer.h.26.attn.attention.out_proj.weight": "model-00005-of-00007.safetensors",
180
+ "transformer.h.26.attn.attention.q_proj.weight": "model-00005-of-00007.safetensors",
181
+ "transformer.h.26.attn.attention.v_proj.weight": "model-00005-of-00007.safetensors",
182
+ "transformer.h.26.ln_1.weight": "model-00005-of-00007.safetensors",
183
+ "transformer.h.26.ln_2.weight": "model-00005-of-00007.safetensors",
184
+ "transformer.h.26.mlp.c_fc_0.weight": "model-00005-of-00007.safetensors",
185
+ "transformer.h.26.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
186
+ "transformer.h.26.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
187
+ "transformer.h.27.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
188
+ "transformer.h.27.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
189
+ "transformer.h.27.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
190
+ "transformer.h.27.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
191
+ "transformer.h.27.ln_1.weight": "model-00006-of-00007.safetensors",
192
+ "transformer.h.27.ln_2.weight": "model-00006-of-00007.safetensors",
193
+ "transformer.h.27.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
194
+ "transformer.h.27.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
195
+ "transformer.h.27.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
196
+ "transformer.h.28.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
197
+ "transformer.h.28.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
198
+ "transformer.h.28.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
199
+ "transformer.h.28.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
200
+ "transformer.h.28.ln_1.weight": "model-00006-of-00007.safetensors",
201
+ "transformer.h.28.ln_2.weight": "model-00006-of-00007.safetensors",
202
+ "transformer.h.28.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
203
+ "transformer.h.28.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
204
+ "transformer.h.28.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
205
+ "transformer.h.29.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
206
+ "transformer.h.29.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
207
+ "transformer.h.29.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
208
+ "transformer.h.29.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
209
+ "transformer.h.29.ln_1.weight": "model-00006-of-00007.safetensors",
210
+ "transformer.h.29.ln_2.weight": "model-00006-of-00007.safetensors",
211
+ "transformer.h.29.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
212
+ "transformer.h.29.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
213
+ "transformer.h.29.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
214
+ "transformer.h.3.attn.attention.k_proj.weight": "model-00001-of-00007.safetensors",
215
+ "transformer.h.3.attn.attention.out_proj.weight": "model-00001-of-00007.safetensors",
216
+ "transformer.h.3.attn.attention.q_proj.weight": "model-00001-of-00007.safetensors",
217
+ "transformer.h.3.attn.attention.v_proj.weight": "model-00001-of-00007.safetensors",
218
+ "transformer.h.3.ln_1.weight": "model-00001-of-00007.safetensors",
219
+ "transformer.h.3.ln_2.weight": "model-00001-of-00007.safetensors",
220
+ "transformer.h.3.mlp.c_fc_0.weight": "model-00001-of-00007.safetensors",
221
+ "transformer.h.3.mlp.c_fc_1.weight": "model-00001-of-00007.safetensors",
222
+ "transformer.h.3.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
223
+ "transformer.h.30.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
224
+ "transformer.h.30.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
225
+ "transformer.h.30.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
226
+ "transformer.h.30.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
227
+ "transformer.h.30.ln_1.weight": "model-00006-of-00007.safetensors",
228
+ "transformer.h.30.ln_2.weight": "model-00006-of-00007.safetensors",
229
+ "transformer.h.30.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
230
+ "transformer.h.30.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
231
+ "transformer.h.30.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
232
+ "transformer.h.31.attn.attention.k_proj.weight": "model-00006-of-00007.safetensors",
233
+ "transformer.h.31.attn.attention.out_proj.weight": "model-00006-of-00007.safetensors",
234
+ "transformer.h.31.attn.attention.q_proj.weight": "model-00006-of-00007.safetensors",
235
+ "transformer.h.31.attn.attention.v_proj.weight": "model-00006-of-00007.safetensors",
236
+ "transformer.h.31.ln_1.weight": "model-00006-of-00007.safetensors",
237
+ "transformer.h.31.ln_2.weight": "model-00006-of-00007.safetensors",
238
+ "transformer.h.31.mlp.c_fc_0.weight": "model-00006-of-00007.safetensors",
239
+ "transformer.h.31.mlp.c_fc_1.weight": "model-00006-of-00007.safetensors",
240
+ "transformer.h.31.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
241
+ "transformer.h.4.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
242
+ "transformer.h.4.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
243
+ "transformer.h.4.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
244
+ "transformer.h.4.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
245
+ "transformer.h.4.ln_1.weight": "model-00002-of-00007.safetensors",
246
+ "transformer.h.4.ln_2.weight": "model-00002-of-00007.safetensors",
247
+ "transformer.h.4.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
248
+ "transformer.h.4.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
249
+ "transformer.h.4.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
250
+ "transformer.h.5.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
251
+ "transformer.h.5.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
252
+ "transformer.h.5.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
253
+ "transformer.h.5.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
254
+ "transformer.h.5.ln_1.weight": "model-00002-of-00007.safetensors",
255
+ "transformer.h.5.ln_2.weight": "model-00002-of-00007.safetensors",
256
+ "transformer.h.5.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
257
+ "transformer.h.5.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
258
+ "transformer.h.5.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
259
+ "transformer.h.6.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
260
+ "transformer.h.6.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
261
+ "transformer.h.6.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
262
+ "transformer.h.6.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
263
+ "transformer.h.6.ln_1.weight": "model-00002-of-00007.safetensors",
264
+ "transformer.h.6.ln_2.weight": "model-00002-of-00007.safetensors",
265
+ "transformer.h.6.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
266
+ "transformer.h.6.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
267
+ "transformer.h.6.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
268
+ "transformer.h.7.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
269
+ "transformer.h.7.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
270
+ "transformer.h.7.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
271
+ "transformer.h.7.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
272
+ "transformer.h.7.ln_1.weight": "model-00002-of-00007.safetensors",
273
+ "transformer.h.7.ln_2.weight": "model-00002-of-00007.safetensors",
274
+ "transformer.h.7.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
275
+ "transformer.h.7.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
276
+ "transformer.h.7.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
277
+ "transformer.h.8.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
278
+ "transformer.h.8.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
279
+ "transformer.h.8.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
280
+ "transformer.h.8.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
281
+ "transformer.h.8.ln_1.weight": "model-00002-of-00007.safetensors",
282
+ "transformer.h.8.ln_2.weight": "model-00002-of-00007.safetensors",
283
+ "transformer.h.8.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
284
+ "transformer.h.8.mlp.c_fc_1.weight": "model-00002-of-00007.safetensors",
285
+ "transformer.h.8.mlp.c_proj.weight": "model-00002-of-00007.safetensors",
286
+ "transformer.h.9.attn.attention.k_proj.weight": "model-00002-of-00007.safetensors",
287
+ "transformer.h.9.attn.attention.out_proj.weight": "model-00002-of-00007.safetensors",
288
+ "transformer.h.9.attn.attention.q_proj.weight": "model-00002-of-00007.safetensors",
289
+ "transformer.h.9.attn.attention.v_proj.weight": "model-00002-of-00007.safetensors",
290
+ "transformer.h.9.ln_1.weight": "model-00002-of-00007.safetensors",
291
+ "transformer.h.9.ln_2.weight": "model-00002-of-00007.safetensors",
292
+ "transformer.h.9.mlp.c_fc_0.weight": "model-00002-of-00007.safetensors",
293
+ "transformer.h.9.mlp.c_fc_1.weight": "model-00003-of-00007.safetensors",
294
+ "transformer.h.9.mlp.c_proj.weight": "model-00003-of-00007.safetensors",
295
+ "transformer.ln_f.weight": "model-00006-of-00007.safetensors",
296
+ "transformer.wte.weight": "model-00001-of-00007.safetensors"
297
+ }
298
+ }
modeling_exaone.py ADDED
@@ -0,0 +1,1747 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2021 The LG AI Research EXAONE Lab
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
6
+ # and OPT implementations in this library. It has been modified from its
7
+ # original forms to accommodate minor architectural differences compared
8
+ # to GPT-NeoX and OPT used by the Meta AI team that trained the model.
9
+ #
10
+ # Licensed under the Apache License, Version 2.0 (the "License");
11
+ # you may not use this file except in compliance with the License.
12
+ # You may obtain a copy of the License at
13
+ #
14
+ # http://www.apache.org/licenses/LICENSE-2.0
15
+ #
16
+ # Unless required by applicable law or agreed to in writing, software
17
+ # distributed under the License is distributed on an "AS IS" BASIS,
18
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
19
+ # See the License for the specific language governing permissions and
20
+ # limitations under the License.
21
+ """ LG AI Research EXAONE Lab"""
22
+ import sys
23
+ import os
24
+ from typing import List, Optional, Tuple, Union
25
+ from packaging import version
26
+
27
+ import torch
28
+ import torch.utils.checkpoint
29
+ from torch import nn
30
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
31
+ import torch.nn.functional as F
32
+
33
+ from transformers.activations import ACT2FN
34
+ from transformers.cache_utils import Cache, DynamicCache, StaticCache
35
+ from transformers.pytorch_utils import ALL_LAYERNORM_LAYERS
36
+ from transformers.configuration_utils import PretrainedConfig
37
+ from transformers.modeling_attn_mask_utils import AttentionMaskConverter
38
+
39
+ from transformers.modeling_outputs import (
40
+ BaseModelOutputWithPast,
41
+ BaseModelOutputWithPastAndCrossAttentions,
42
+ CausalLMOutputWithCrossAttentions,
43
+ CausalLMOutputWithPast,
44
+ SequenceClassifierOutputWithPast,
45
+ QuestionAnsweringModelOutput,
46
+ )
47
+ from transformers.modeling_utils import PreTrainedModel
48
+ from transformers.utils import (
49
+ add_code_sample_docstrings,
50
+ add_start_docstrings,
51
+ add_start_docstrings_to_model_forward,
52
+ is_flash_attn_2_available,
53
+ logging,
54
+ )
55
+ from .configuration_exaone import ExaoneConfig
56
+ from torch.nn.utils import skip_init
57
+ import math
58
+ import numpy as np
59
+ from typing import List, Optional, Tuple, Union
60
+
61
+
62
+ if is_flash_attn_2_available():
63
+ try:
64
+ import inspect
65
+ from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input
66
+ from flash_attn import flash_attn_func, flash_attn_varlen_func
67
+
68
+ _flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)
69
+
70
+ import flash_attn
71
+ if version.parse(flash_attn.__version__) > version.parse('2.4.2'):
72
+ from flash_attn.ops.triton.layer_norm import rms_norm_fn
73
+ else:
74
+ from flash_attn.ops.triton.layernorm import rms_norm_fn
75
+ except:
76
+ pass
77
+
78
+
79
+ logger = logging.get_logger(__name__)
80
+
81
+ _CHECKPOINT_FOR_DOC = "exaone"
82
+ _CONFIG_FOR_DOC = "ExaoneConfig"
83
+
84
+ EXAONE_PRETRAINED_MODEL_ARCHIVE_LIST = [
85
+ "exaone",
86
+ ]
87
+
88
+
89
+ @torch.jit.script
90
+ def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
91
+ """
92
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
93
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
94
+ """
95
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
96
+ if n_rep == 1:
97
+ return hidden_states
98
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
99
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
100
+
101
+
102
+ def apply_rotary_pos_emb(q, k, cos, sin, unsqueeze_dim=1):
103
+ """Applies Rotary Position Embedding to the query and key tensors.
104
+
105
+ Args:
106
+ q (`torch.Tensor`): The query tensor.
107
+ k (`torch.Tensor`): The key tensor.
108
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
109
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
110
+ unsqueeze_dim (`int`, *optional*, defaults to 1):
111
+ The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
112
+ sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
113
+ that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
114
+ k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
115
+ cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
116
+ the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
117
+ Returns:
118
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
119
+ """
120
+ cos = cos.unsqueeze(unsqueeze_dim)
121
+ sin = sin.unsqueeze(unsqueeze_dim)
122
+ q_embed = (q * cos) + (rotate_half(q) * sin)
123
+ k_embed = (k * cos) + (rotate_half(k) * sin)
124
+ return q_embed, k_embed
125
+
126
+
127
+ def rotate_half(x):
128
+ """ Rotates half the hidden dims of the input. """
129
+ x1 = x[..., : x.shape[-1] // 2]
130
+ x2 = x[..., x.shape[-1] // 2 :]
131
+ return torch.cat((-x2, x1), dim=-1)
132
+
133
+
134
+ # copied from llama
135
+ def _prepare_4d_causal_attention_mask_with_cache_position(
136
+ attention_mask: torch.Tensor,
137
+ sequence_length: int,
138
+ target_length: int,
139
+ dtype: torch.dtype,
140
+ device: torch.device,
141
+ min_dtype: float,
142
+ cache_position: torch.Tensor,
143
+ batch_size: int,
144
+ ):
145
+ """
146
+ Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
147
+ `(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
148
+
149
+ Args:
150
+ attention_mask (`torch.Tensor`):
151
+ A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape `(batch_size, 1, query_length, key_value_length)`.
152
+ sequence_length (`int`):
153
+ The sequence length being processed.
154
+ target_length (`int`):
155
+ The target length: when generating with static cache, the mask should be as long as the static cache, to account for the 0 padding, the part of the cache that is not filled yet.
156
+ dtype (`torch.dtype`):
157
+ The dtype to use for the 4D attention mask.
158
+ device (`torch.device`):
159
+ The device to plcae the 4D attention mask on.
160
+ min_dtype (`float`):
161
+ The minimum value representable with the dtype `dtype`.
162
+ cache_position (`torch.Tensor`):
163
+ Indices depicting the position of the input sequence tokens in the sequence.
164
+ batch_size (`torch.Tensor`):
165
+ Batch size.
166
+ """
167
+ if attention_mask is not None and attention_mask.dim() == 4:
168
+ # In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
169
+ causal_mask = attention_mask
170
+ else:
171
+ causal_mask = torch.full((sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=device)
172
+ if sequence_length != 1:
173
+ causal_mask = torch.triu(causal_mask, diagonal=1)
174
+ causal_mask *= torch.arange(target_length, device=device) > cache_position.reshape(-1, 1)
175
+ causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
176
+ if attention_mask is not None:
177
+ causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
178
+ mask_length = attention_mask.shape[-1]
179
+ padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :]
180
+ padding_mask = padding_mask == 0
181
+ causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
182
+ padding_mask, min_dtype
183
+ )
184
+
185
+ return causal_mask
186
+
187
+
188
+ class ExaoneRMSNorm(torch.nn.Module):
189
+ def __init__(self, hidden_size, eps=1e-6):
190
+ super().__init__()
191
+ self.eps = eps
192
+ self.weight = torch.nn.Parameter(torch.ones(hidden_size))
193
+
194
+ def forward(self, hidden_states):
195
+ input_dtype = hidden_states.dtype
196
+ hidden_states = hidden_states.to(torch.float32)
197
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
198
+ hidden_states = hidden_states * torch.rsqrt(variance + self.eps)
199
+ return self.weight * hidden_states.to(input_dtype)
200
+
201
+
202
+ class ExaoneTritonRMSNorm(torch.nn.Module):
203
+ def __init__(
204
+ self,
205
+ hidden_size: int = 0,
206
+ eps: float = 1e-5,
207
+ ):
208
+ super().__init__()
209
+ self.eps = eps
210
+ self.drop = None
211
+ self.weight = torch.nn.Parameter(torch.empty(hidden_size))
212
+ self.register_parameter("bias", None)
213
+ self.reset_parameters()
214
+
215
+ def reset_parameters(self):
216
+ torch.nn.init.ones_(self.weight)
217
+
218
+ def forward(self, x, residual=None, prenorm=False, residual_in_fp32=False):
219
+ return rms_norm_fn(
220
+ x,
221
+ self.weight,
222
+ self.bias,
223
+ residual=residual,
224
+ eps=self.eps,
225
+ dropout_p=self.drop.p if self.drop is not None and self.training else 0.0,
226
+ prenorm=prenorm,
227
+ residual_in_fp32=residual_in_fp32,
228
+ )
229
+
230
+
231
+ ALL_LAYERNORM_LAYERS.append(ExaoneRMSNorm)
232
+ ALL_LAYERNORM_LAYERS.append(ExaoneTritonRMSNorm)
233
+
234
+
235
+ class ExaoneRotaryEmbedding(nn.Module):
236
+ """
237
+ Common description for the functions named `_compute_XXX_rope_parameters()`
238
+ - Copied from `transformers.modeling_rope_utils` in v4.43, with some modifications.
239
+
240
+ Computes the inverse frequencies with linear scaling.
241
+ The EXAONE model supports 'default', 'linear', 'dynamic', and 'yarn'.
242
+
243
+ Args:
244
+ config (:obj:`~transformers.PretrainedConfig`):
245
+ The model configuration.
246
+ device (:obj:`torch.device`):
247
+ The device to use for initialization of the inverse frequencies.
248
+ seq_len (:obj:`int`, `optional`):
249
+ The current sequence length. Unused for this type of RoPE.
250
+ Returns:
251
+ Tuple of (:obj:`torch.Tensor`, :obj:`float`), containing the inverse frequencies for the RoPE embeddings and the
252
+ post-processing scaling factor applied to the computed cos/sin (unused in some types of RoPE).
253
+ """
254
+
255
+ def _compute_default_rope_parameters(
256
+ self,
257
+ config: Optional[PretrainedConfig],
258
+ device: Optional["torch.device"] = None,
259
+ seq_len: Optional[int] = None,
260
+ ) -> Tuple["torch.Tensor", float]:
261
+ base = config.rope_theta
262
+ partial_rotary_factor = config.partial_rotary_factor if hasattr(config, "partial_rotary_factor") else 1.0
263
+ dim = int((config.hidden_size // config.num_attention_heads) * partial_rotary_factor)
264
+
265
+ attention_factor = 1.0 # Unused in this type of RoPE
266
+
267
+ inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2, dtype=torch.int64).float().to(device) / dim))
268
+ return inv_freq, attention_factor
269
+
270
+ def _compute_linear_scaling_rope_parameters(
271
+ self,
272
+ config: Optional[PretrainedConfig],
273
+ device: Optional["torch.device"] = None,
274
+ seq_len: Optional[int] = None,
275
+ ) -> Tuple["torch.Tensor", float]:
276
+ factor = config.rope_scaling["factor"]
277
+ if factor < 1.0:
278
+ logger.warning_once(f"`rope_scaling`'s factor field must be a float >= 1, got {factor}")
279
+
280
+ inv_freq, attention_factor = self._compute_default_rope_parameters(config, device, seq_len)
281
+ inv_freq /= factor
282
+ return inv_freq, attention_factor
283
+
284
+ def _compute_dynamic_ntk_parameters(
285
+ self,
286
+ config: Optional[PretrainedConfig],
287
+ device: Optional["torch.device"] = None,
288
+ seq_len: Optional[int] = None,
289
+ ) -> Tuple["torch.Tensor", float]:
290
+ base = config.rope_theta
291
+ partial_rotary_factor = config.partial_rotary_factor if hasattr(config, "partial_rotary_factor") else 1.0
292
+ dim = int((config.hidden_size // config.num_attention_heads) * partial_rotary_factor)
293
+ max_position_embeddings = config.max_position_embeddings
294
+ factor = config.rope_scaling["factor"]
295
+ if factor < 1.0:
296
+ logger.warning_once(f"`rope_scaling`'s factor field must be a float >= 1, got {factor}")
297
+
298
+ attention_factor = 1.0 # Unused in this type of RoPE
299
+ seq_len = seq_len if seq_len is not None else max_position_embeddings
300
+
301
+ base = base * ((factor * seq_len / max_position_embeddings) - (factor - 1)) ** (dim / (dim - 2))
302
+ inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2, dtype=torch.int64).float().to(device) / dim))
303
+ return inv_freq, attention_factor
304
+
305
+ def _compute_yarn_parameters(
306
+ self,
307
+ config: PretrainedConfig,
308
+ device: "torch.device",
309
+ seq_len: Optional[int] = None,
310
+ ) -> Tuple["torch.Tensor", float]:
311
+ base = config.rope_theta
312
+ partial_rotary_factor = config.partial_rotary_factor if hasattr(config, "partial_rotary_factor") else 1.0
313
+ dim = int((config.hidden_size // config.num_attention_heads) * partial_rotary_factor)
314
+ max_position_embeddings = config.max_position_embeddings
315
+ factor = config.rope_scaling["factor"]
316
+ if factor < 1.0:
317
+ logger.warning_once(f"`rope_scaling`'s factor field must be a float >= 1, got {factor}")
318
+
319
+ # Sets the attention factor as suggested in the paper
320
+ attention_factor = config.rope_scaling.get("attention_factor")
321
+ if attention_factor is None:
322
+ attention_factor = 0.1 * math.log(factor) + 1.0
323
+ if attention_factor < 0:
324
+ logger.warning_once(
325
+ f"`rope_scaling`'s attention_factor field must be a float greater than 0, got {attention_factor}"
326
+ )
327
+
328
+ # Optional config options
329
+ # beta_fast/beta_slow: as suggested in the paper, default to 32/1 (correspondingly)
330
+ beta_fast = config.rope_scaling.get("beta_fast") or 32
331
+ beta_slow = config.rope_scaling.get("beta_slow") or 1
332
+ if not isinstance(beta_fast, float):
333
+ logger.warning_once(f"`rope_scaling`'s beta_fast field must be a float, got {beta_fast}")
334
+ if not isinstance(beta_slow, float):
335
+ logger.warning_once(f"`rope_scaling`'s beta_slow field must be a float, got {beta_fast}")
336
+ if beta_fast < beta_slow:
337
+ logger.warning_once(
338
+ f"`rope_scaling`'s beta_fast field must be greater than beta_slow, got beta_fast={beta_fast} "
339
+ f"(defaults to 32 if None) and beta_slow={beta_slow} (defaults to 1 if None)"
340
+ )
341
+
342
+ # Compute the inverse frequencies
343
+ def find_correction_dim(num_rotations, dim, base, max_position_embeddings):
344
+ """Inverse dimension formula to find the dimension based on the number of rotations"""
345
+ return (dim * math.log(max_position_embeddings / (num_rotations * 2 * math.pi))) / (2 * math.log(base))
346
+
347
+ def find_correction_range(low_rot, high_rot, dim, base, max_position_embeddings):
348
+ """Find dimension range bounds based on rotations"""
349
+ low = math.floor(find_correction_dim(low_rot, dim, base, max_position_embeddings))
350
+ high = math.ceil(find_correction_dim(high_rot, dim, base, max_position_embeddings))
351
+ return max(low, 0), min(high, dim - 1)
352
+
353
+ def linear_ramp_mask(min, max, dim):
354
+ if min == max:
355
+ max += 0.001 # Prevent singularity
356
+
357
+ linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min)
358
+ ramp_func = torch.clamp(linear_func, 0, 1)
359
+ return ramp_func
360
+
361
+ pos_freqs = base ** (torch.arange(0, dim, 2).float().to(device) / dim)
362
+ inv_freq_extrapolation = 1.0 / pos_freqs
363
+ inv_freq_interpolation = 1.0 / (factor * pos_freqs)
364
+
365
+ low, high = find_correction_range(beta_fast, beta_slow, dim, base, max_position_embeddings)
366
+
367
+ # Get n-dimensional rotational scaling corrected for extrapolation
368
+ inv_freq_mask = 1 - linear_ramp_mask(low, high, dim // 2).float().to(device)
369
+ inv_freq = inv_freq_interpolation * (1 - inv_freq_mask) + inv_freq_extrapolation * inv_freq_mask
370
+
371
+ return inv_freq, attention_factor
372
+
373
+ def __init__(self, config: ExaoneConfig, device=None):
374
+ ROPE_INIT_FUNCTIONS = {
375
+ "default": self._compute_default_rope_parameters,
376
+ "linear": self._compute_linear_scaling_rope_parameters,
377
+ "dynamic": self._compute_dynamic_ntk_parameters,
378
+ "yarn": self._compute_yarn_parameters,
379
+ }
380
+
381
+ super().__init__()
382
+ if config.rope_scaling is not None:
383
+ self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type"))
384
+ else:
385
+ self.rope_type = "default"
386
+ self.max_seq_len = config.max_position_embeddings
387
+ self.original_max_seq_len = config.max_position_embeddings
388
+
389
+ self.config = config
390
+ if self.rope_type not in ROPE_INIT_FUNCTIONS:
391
+ raise KeyError(f"The EXAONE model does not support RoPE type: {self.rope_type}")
392
+ self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
393
+
394
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)
395
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
396
+ self.original_inv_freq = self.inv_freq
397
+
398
+ def _update_freq(self, position_ids, device):
399
+ """
400
+ dynamic RoPE layers should recompute `inv_freq` in the following situations:
401
+ 1 - growing beyond the cached sequence length (allow scaling)
402
+ 2 - the current sequence length is in the original scale (avoid losing precision with small sequences)
403
+ """
404
+ seq_len = torch.max(position_ids) + 1
405
+ if seq_len > self.max_seq_len: # expand to seq_len
406
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device, seq_len=seq_len)
407
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
408
+ self.max_seq_len = seq_len
409
+
410
+ if seq_len < self.original_max_seq_len and self.max_seq_len > self.original_max_seq_len: # reset to original
411
+ self.register_buffer("inv_freq", self.original_inv_freq, persistent=False)
412
+ self.max_seq_len = self.original_max_seq_len
413
+
414
+ @torch.no_grad()
415
+ def forward(self, x, position_ids):
416
+ if "dynamic" in self.rope_type:
417
+ self._update_freq(position_ids, device=x.device)
418
+
419
+ inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
420
+ position_ids_expanded = position_ids[:, None, :].float()
421
+
422
+ device_type = x.device.type
423
+ device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"
424
+ with torch.autocast(device_type=device_type, enabled=False):
425
+ freqs = (inv_freq_expanded @ position_ids_expanded).transpose(1, 2)
426
+ emb = torch.cat((freqs, freqs), dim=-1)
427
+ cos, sin = emb.cos(), emb.sin()
428
+
429
+ cos, sin = cos * self.attention_scaling, sin * self.attention_scaling
430
+ return cos.to(x.dtype), sin.to(x.dtype)
431
+
432
+
433
+ class ExaoneSelfAttention(nn.Module):
434
+ def __init__(self, config: ExaoneConfig, layer_idx: Optional[int] = None):
435
+ super().__init__()
436
+ self.config = config
437
+ self.layer_idx = layer_idx
438
+ self.embed_dim = config.hidden_size
439
+ self.num_heads = config.num_attention_heads
440
+ self.head_dim = self.embed_dim // self.num_heads
441
+ self.num_key_value_heads = config.num_key_value_heads
442
+ self.num_key_value_groups = self.num_heads // self.num_key_value_heads
443
+ self.attention_dropout_rate = config.attention_dropout
444
+
445
+ if self.head_dim * self.num_heads != self.embed_dim:
446
+ raise ValueError(
447
+ f"embed_dim must be divisible by num_heads (got `embed_dim`: {self.embed_dim} and `num_heads`: {self.num_heads})."
448
+ )
449
+
450
+ self.rotary = ExaoneRotaryEmbedding(config)
451
+
452
+ self.k_proj = nn.Linear(self.embed_dim, self.num_key_value_heads * self.head_dim, bias=False)
453
+ self.v_proj = nn.Linear(self.embed_dim, self.num_key_value_heads * self.head_dim, bias=False)
454
+ self.q_proj = nn.Linear(self.embed_dim, self.num_heads * self.head_dim, bias=False)
455
+ self.out_proj = nn.Linear(self.embed_dim, self.embed_dim, bias=False)
456
+
457
+ def forward(
458
+ self,
459
+ hidden_states: torch.Tensor,
460
+ attention_mask: Optional[torch.Tensor] = None,
461
+ position_ids: Optional[torch.LongTensor] = None,
462
+ past_key_value: Optional[Cache] = None,
463
+ output_attentions: Optional[bool] = False,
464
+ use_cache: Optional[bool] = False,
465
+ cache_position: Optional[torch.LongTensor] = None,
466
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
467
+ **kwargs,
468
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
469
+
470
+ bsz, q_len, _ = hidden_states.size()
471
+ query_states = self.q_proj(hidden_states)
472
+ key_states = self.k_proj(hidden_states)
473
+ value_states = self.v_proj(hidden_states)
474
+
475
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
476
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
477
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
478
+
479
+ if position_embeddings is None:
480
+ cos, sin = self.rotary(value_states, position_ids=position_ids)
481
+ else:
482
+ cos, sin = position_embeddings
483
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
484
+
485
+ if past_key_value is not None:
486
+ # sin and cos are specific to RoPE models; cache_position needed for the static cache
487
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
488
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
489
+
490
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
491
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
492
+
493
+ attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
494
+
495
+ if attention_mask is not None:
496
+ causal_mask = attention_mask[:, :, :, :key_states.shape[-2]]
497
+ attn_weights = attn_weights + causal_mask
498
+
499
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
500
+ attn_weights = nn.functional.dropout(attn_weights, p=self.attention_dropout_rate, training=self.training)
501
+ attn_output = torch.matmul(attn_weights, value_states)
502
+
503
+ if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
504
+ raise ValueError(
505
+ f"Attention outputs should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
506
+ f" {attn_output.size()}"
507
+ )
508
+
509
+ attn_output = attn_output.transpose(1, 2).contiguous()
510
+ attn_output = attn_output.reshape(bsz, q_len, self.embed_dim).contiguous()
511
+
512
+ attn_output = self.out_proj(attn_output)
513
+
514
+ if not output_attentions:
515
+ attn_weights = None
516
+
517
+ return attn_output, attn_weights, past_key_value
518
+
519
+
520
+ class ExaoneFlashAttention(ExaoneSelfAttention):
521
+ def __init__(self, *args, **kwargs):
522
+ super().__init__(*args, **kwargs)
523
+
524
+ def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int):
525
+ return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous()
526
+
527
+ def forward(
528
+ self,
529
+ hidden_states: torch.Tensor,
530
+ attention_mask: Optional[torch.Tensor] = None,
531
+ position_ids: Optional[torch.LongTensor] = None,
532
+ past_key_value: Optional[Cache] = None,
533
+ output_attentions: Optional[bool] = False,
534
+ use_cache: Optional[bool] = False,
535
+ cache_position: Optional[torch.LongTensor] = None,
536
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
537
+ **kwargs,
538
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
539
+ if isinstance(past_key_value, StaticCache):
540
+ raise ValueError(
541
+ "`static` cache implementation is not compatible with `attn_implementation==flash_attention_2` "
542
+ "make sure to use `sdpa` in the mean time, and open an issue at https://github.com/huggingface/transformers"
543
+ )
544
+
545
+ output_attentions = False
546
+
547
+ bsz, q_len, h_size = hidden_states.size()
548
+
549
+ query_states = self.q_proj(hidden_states)
550
+ key_states = self.k_proj(hidden_states)
551
+ value_states = self.v_proj(hidden_states)
552
+
553
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
554
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
555
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
556
+
557
+ if position_embeddings is None:
558
+ cos, sin = self.rotary(value_states, position_ids=position_ids)
559
+ else:
560
+ cos, sin = position_embeddings
561
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
562
+
563
+ if past_key_value is not None:
564
+ # sin and cos are specific to RoPE models; cache_position needed for the static cache
565
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
566
+ # Only update cache as shape of [bsz, n_head, q_len, head_dim]
567
+ # TODO: need to be fixed when transformers' KV cache layout is changed
568
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
569
+
570
+ query_states = query_states.transpose(1, 2)
571
+ key_states = key_states.transpose(1, 2)
572
+ value_states = value_states.transpose(1, 2)
573
+
574
+ # In PEFT, usually we cast the layer norms in float32 for training stability reasons
575
+ # therefore the input hidden states gets silently casted in float32. Hence, we need
576
+ # cast them back in the correct dtype just to be sure everything works as expected.
577
+ input_dtype = query_states.dtype
578
+ if input_dtype == torch.float32:
579
+ if torch.is_autocast_enabled():
580
+ target_dtype = torch.get_autocast_gpu_dtype()
581
+ # Handle the case where the model is quantized
582
+ elif hasattr(self.config, "_pre_quantization_dtype"):
583
+ target_dtype = self.config._pre_quantization_dtype
584
+ else:
585
+ target_dtype = self.q_proj.weight.dtype
586
+
587
+ logger.warning_once(
588
+ f"The input hidden states seems to be silently casted in float32, this might be related to"
589
+ f" the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in"
590
+ f" {target_dtype}."
591
+ )
592
+
593
+ query_states = query_states.to(target_dtype)
594
+ key_states = key_states.to(target_dtype)
595
+ value_states = value_states.to(target_dtype)
596
+
597
+ dropout_rate = self.attention_dropout_rate if self.training else 0.0
598
+
599
+ attn_output = self._flash_attention_forward(
600
+ query_states, key_states, value_states, attention_mask, q_len, dropout=dropout_rate, is_causal=True
601
+ )
602
+
603
+ attn_output = attn_output.reshape(bsz, q_len, self.embed_dim).contiguous()
604
+ attn_output = self.out_proj(attn_output)
605
+
606
+ if not output_attentions:
607
+ attn_weights = None
608
+
609
+ return attn_output, attn_weights, past_key_value
610
+
611
+ @staticmethod
612
+ def _flash_attention_forward(
613
+ query_states: torch.Tensor,
614
+ key_states: torch.Tensor,
615
+ value_states: torch.Tensor,
616
+ attention_mask: torch.Tensor,
617
+ query_length: int,
618
+ is_causal: bool,
619
+ dropout: float = 0.0,
620
+ softmax_scale: Optional[float] = None,
621
+ sliding_window: Optional[int] = None,
622
+ use_top_left_mask: bool = False,
623
+ softcap: Optional[float] = None,
624
+ deterministic: bool = os.environ.get("FLASH_ATTENTION_DETERMINISTIC", "0") == "1",
625
+ ):
626
+ """
627
+ Calls the forward method of Flash Attention - if the input hidden states contain at least one padding token
628
+ first unpad the input, then computes the attention scores and pad the final attention scores.
629
+
630
+ Args:
631
+ query_states (`torch.Tensor`):
632
+ Input query states to be passed to Flash Attention API
633
+ key_states (`torch.Tensor`):
634
+ Input key states to be passed to Flash Attention API
635
+ value_states (`torch.Tensor`):
636
+ Input value states to be passed to Flash Attention API
637
+ attention_mask (`torch.Tensor`):
638
+ The padding mask - corresponds to a tensor of size `(batch_size, seq_len)` where 0 stands for the
639
+ position of padding tokens and 1 for the position of non-padding tokens.
640
+ dropout (`float`):
641
+ Attention dropout
642
+ softmax_scale (`float`, *optional*):
643
+ The scaling of QK^T before applying softmax. Default to 1 / sqrt(head_dim)
644
+ use_top_left_mask (`bool`, defaults to `False`):
645
+ flash_attn<2.1 generates top-left aligned causal mask, while what is needed here is bottom-right alignement, that was made default for flash_attn>=2.1. This attribute is used to handle this difference.
646
+ softcap (`float`, *optional*):
647
+ Softcap for the attention logits, used e.g. in gemma2.
648
+ deterministic (`bool`, *optional*):
649
+ Determines if the deterministic option introduced in flash_attn>=2.4.1 is enabled.
650
+ """
651
+ if not use_top_left_mask:
652
+ causal = is_causal
653
+ else:
654
+ # TODO: Remove the `query_length != 1` check once Flash Attention for RoCm is bumped to 2.1. For details, please see the comment in transformers.models.llama.modeling_llama.LlamaFlashAttention2.__init__.
655
+ causal = is_causal and query_length != 1
656
+
657
+ # Assuming 4D tensors, key_states.shape[1] is the key/value sequence length (source length).
658
+ use_sliding_windows = (
659
+ _flash_supports_window_size and sliding_window is not None and key_states.shape[1] > sliding_window
660
+ )
661
+ flash_kwargs = {"window_size": (sliding_window, sliding_window)} if use_sliding_windows else {}
662
+
663
+ if softcap is not None:
664
+ flash_kwargs["softcap"] = softcap
665
+
666
+ # Contains at least one padding token in the sequence
667
+ if attention_mask is not None:
668
+ batch_size = query_states.shape[0]
669
+ query_states, key_states, value_states, indices_q, cu_seq_lens, max_seq_lens = ExaoneFlashAttention._upad_input(
670
+ query_states, key_states, value_states, attention_mask, query_length
671
+ )
672
+ cu_seqlens_q, cu_seqlens_k = cu_seq_lens
673
+ max_seqlen_in_batch_q, max_seqlen_in_batch_k = max_seq_lens
674
+
675
+ attn_output_unpad = flash_attn_varlen_func(
676
+ query_states,
677
+ key_states,
678
+ value_states,
679
+ cu_seqlens_q=cu_seqlens_q,
680
+ cu_seqlens_k=cu_seqlens_k,
681
+ max_seqlen_q=max_seqlen_in_batch_q,
682
+ max_seqlen_k=max_seqlen_in_batch_k,
683
+ dropout_p=dropout,
684
+ softmax_scale=softmax_scale,
685
+ causal=causal,
686
+ **flash_kwargs,
687
+ )
688
+ attn_output = pad_input(attn_output_unpad, indices_q, batch_size, query_length)
689
+ else:
690
+ attn_output = flash_attn_func(
691
+ query_states, key_states, value_states, dropout, softmax_scale=softmax_scale, causal=causal, **flash_kwargs
692
+ )
693
+
694
+ return attn_output
695
+
696
+ @staticmethod
697
+ def _upad_input(
698
+ query_layer: torch.Tensor,
699
+ key_layer: torch.Tensor,
700
+ value_layer: torch.Tensor,
701
+ attention_mask: torch.Tensor,
702
+ query_length: int,
703
+ ):
704
+ """
705
+ Unpads query, key, and values tensors, using a single dimension for all tokens even though they belong to different batches.
706
+
707
+ This function is used instead of `flash_attn.bert_padding.unpad_input` in order to avoid the recomputation of the same intermediary
708
+ tensors for query, key, value tensors.
709
+
710
+ Arguments:
711
+ query_layer (`torch.Tensor`):
712
+ Query state with padding. Shape: (batch_size, query_length, num_heads, head_dim).
713
+ key_layer (`torch.Tensor`):
714
+ Key state with padding. Shape: (batch_size, kv_seq_len, num_key_value_heads, head_dim).
715
+ value_layer (`torch.Tensor`):
716
+ Value state with padding. Shape: (batch_size, kv_seq_len, num_key_value_heads, head_dim).
717
+ attention_mask (`torch.Tensor`):
718
+ Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.
719
+ query_length (`int`):
720
+ Target length.
721
+
722
+ Return:
723
+ query_layer (`torch.Tensor):
724
+ Query state without padding. Shape: (total_target_length, num_heads, head_dim).
725
+ key_layer (`torch.Tensor`):
726
+ Key state with padding. Shape: (total_source_length, num_key_value_heads, head_dim).
727
+ value_layer (`torch.Tensor`):
728
+ Value state with padding. Shape: (total_source_length, num_key_value_heads, head_dim).
729
+ indices_q (`torch.Tensor`):
730
+ The indices of non-masked tokens from the flattened input target sequence.
731
+ (cu_seqlens_q, cu_seqlens_k) (`Tuple[int]`):
732
+ The cumulative sequence lengths for the target (query) and source (key, value), used to index into ragged (unpadded) tensors. `cu_seqlens` shape is (batch_size + 1,).
733
+ (max_seqlen_in_batch_q, max_seqlen_in_batch_k) (`Tuple[int]`):
734
+ Maximum sequence length in batch (`max_seqlen_in_batch_q` for the target sequence i.e. query, `max_seqlen_in_batch_k` for the source sequence i.e. key/value).
735
+ """
736
+ indices_k, cu_seqlens_k, max_seqlen_in_batch_k = ExaoneFlashAttention._get_unpad_data(attention_mask)
737
+ batch_size, kv_seq_len, num_key_value_heads, head_dim = key_layer.shape
738
+
739
+ key_layer = index_first_axis(key_layer.reshape(batch_size * kv_seq_len, num_key_value_heads, head_dim), indices_k)
740
+ value_layer = index_first_axis(
741
+ value_layer.reshape(batch_size * kv_seq_len, num_key_value_heads, head_dim), indices_k
742
+ )
743
+ if query_length == kv_seq_len:
744
+ query_layer = index_first_axis(query_layer.reshape(batch_size * kv_seq_len, -1, head_dim), indices_k)
745
+ cu_seqlens_q = cu_seqlens_k
746
+ max_seqlen_in_batch_q = max_seqlen_in_batch_k
747
+ indices_q = indices_k
748
+ elif query_length == 1:
749
+ max_seqlen_in_batch_q = 1
750
+ cu_seqlens_q = torch.arange(
751
+ batch_size + 1, dtype=torch.int32, device=query_layer.device
752
+ ) # There is a memcpy here, that is very bad.
753
+ indices_q = cu_seqlens_q[:-1]
754
+ query_layer = query_layer.squeeze(1)
755
+ else:
756
+ # The -q_len: slice assumes left padding.
757
+ attention_mask = attention_mask[:, -query_length:]
758
+ query_layer, indices_q, cu_seqlens_q, max_seqlen_in_batch_q = unpad_input(query_layer, attention_mask)
759
+
760
+ return (
761
+ query_layer,
762
+ key_layer,
763
+ value_layer,
764
+ indices_q,
765
+ (cu_seqlens_q, cu_seqlens_k),
766
+ (max_seqlen_in_batch_q, max_seqlen_in_batch_k),
767
+ )
768
+
769
+ @staticmethod
770
+ def _get_unpad_data(attention_mask: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, int]:
771
+ """
772
+ Retrieves indexing data required to repad unpadded (ragged) tensors.
773
+
774
+ Arguments:
775
+ attention_mask (`torch.Tensor`):
776
+ Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.
777
+
778
+ Return:
779
+ indices (`torch.Tensor):
780
+ The indices of non-masked tokens from the flattened input sequence.
781
+ cu_seqlens (`torch.Tensor`):
782
+ The cumulative sequence lengths, used to index into ragged (unpadded) tensors. `cu_seqlens` shape is (batch_size + 1,).
783
+ max_seqlen_in_batch (`int`):
784
+ Maximum sequence length in batch.
785
+ """
786
+ seqlens_in_batch = attention_mask.sum(dim=-1, dtype=torch.int32)
787
+ indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
788
+ max_seqlen_in_batch = seqlens_in_batch.max().item()
789
+ cu_seqlens = F.pad(torch.cumsum(seqlens_in_batch, dim=0, dtype=torch.int32), (1, 0))
790
+ return (
791
+ indices,
792
+ cu_seqlens,
793
+ max_seqlen_in_batch,
794
+ )
795
+
796
+
797
+ class ExaoneSdpaAttention(ExaoneSelfAttention):
798
+ def __init__(self, *args, **kwargs):
799
+ super().__init__(*args, **kwargs)
800
+
801
+ def forward(
802
+ self,
803
+ hidden_states: torch.Tensor,
804
+ attention_mask: Optional[torch.Tensor] = None,
805
+ position_ids: Optional[torch.LongTensor] = None,
806
+ past_key_value: Optional[Cache] = None,
807
+ output_attentions: Optional[bool] = False,
808
+ use_cache: Optional[bool] = False,
809
+ cache_position: Optional[torch.LongTensor] = None,
810
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
811
+ **kwargs,
812
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
813
+
814
+ if output_attentions:
815
+ logger.warning_once(
816
+ "ExaoneModel is using ExaoneSdpaAttention, but `torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to the manual attention implementation, "
817
+ 'but specifying the manual implementation will be required from Transformers version v5.0.0 onwards. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
818
+ )
819
+ return super().forward(
820
+ hidden_states=hidden_states,
821
+ attention_mask=attention_mask,
822
+ position_ids=position_ids,
823
+ past_key_value=past_key_value,
824
+ output_attentions=output_attentions,
825
+ use_cache=use_cache,
826
+ cache_position=cache_position,
827
+ position_embeddings=position_embeddings,
828
+ **kwargs,
829
+ )
830
+
831
+ bsz, q_len, _ = hidden_states.size()
832
+
833
+ query_states = self.q_proj(hidden_states)
834
+ key_states = self.k_proj(hidden_states)
835
+ value_states = self.v_proj(hidden_states)
836
+
837
+ query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
838
+ key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
839
+ value_states = value_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
840
+
841
+ if position_embeddings is None:
842
+ cos, sin = self.rotary(value_states, position_ids=position_ids)
843
+ else:
844
+ cos, sin = position_embeddings
845
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
846
+
847
+ if past_key_value is not None:
848
+ # sin and cos are specific to RoPE models; cache_position needed for the static cache
849
+ cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position}
850
+ key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
851
+
852
+ key_states = repeat_kv(key_states, self.num_key_value_groups)
853
+ value_states = repeat_kv(value_states, self.num_key_value_groups)
854
+
855
+ causal_mask = attention_mask
856
+ if attention_mask is not None:
857
+ causal_mask = causal_mask[:, :, :, :key_states.shape[-2]]
858
+
859
+ # SDPA with memory-efficient backend is currently (torch==2.1.2) bugged with non-contiguous inputs with custom attn_mask,
860
+ # Reference: https://github.com/pytorch/pytorch/issues/112577.
861
+ if query_states.device.type == "cuda" and causal_mask is not None:
862
+ query_states = query_states.contiguous()
863
+ key_states = key_states.contiguous()
864
+ value_states = value_states.contiguous()
865
+
866
+ # We dispatch to SDPA's Flash Attention or Efficient kernels via this `is_causal` if statement instead of an inline conditional assignment
867
+ # in SDPA to support both torch.compile's dynamic shapes and full graph options. An inline conditional prevents dynamic shapes from compiling.
868
+ is_causal = True if causal_mask is None and q_len > 1 else False
869
+
870
+ attn_output = torch.nn.functional.scaled_dot_product_attention(
871
+ query_states,
872
+ key_states,
873
+ value_states,
874
+ attn_mask=causal_mask,
875
+ dropout_p=self.attention_dropout_rate if self.training else 0.0,
876
+ is_causal=is_causal,
877
+ )
878
+
879
+ attn_output = attn_output.transpose(1, 2).contiguous()
880
+ attn_output = attn_output.reshape(bsz, q_len, self.embed_dim).contiguous()
881
+
882
+ attn_output = self.out_proj(attn_output)
883
+
884
+ return attn_output, None, past_key_value
885
+
886
+
887
+ class ExaoneAttention(nn.Module):
888
+ def __init__(self, config, layer_id=0):
889
+ super().__init__()
890
+ self.layer_id = layer_id
891
+ if 'flash' in config._attn_implementation:
892
+ self.attention = ExaoneFlashAttention(config, self.layer_id)
893
+ elif 'sdpa' in config._attn_implementation:
894
+ self.attention = ExaoneSdpaAttention(config, self.layer_id)
895
+ else:
896
+ self.attention = ExaoneSelfAttention(config, self.layer_id)
897
+
898
+ def forward(
899
+ self,
900
+ hidden_states: torch.Tensor,
901
+ attention_mask: Optional[torch.Tensor] = None,
902
+ position_ids: Optional[torch.LongTensor] = None,
903
+ past_key_value: Optional[Cache] = None,
904
+ output_attentions: Optional[bool] = False,
905
+ use_cache: Optional[bool] = False,
906
+ cache_position: Optional[torch.LongTensor] = None,
907
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
908
+ **kwargs,
909
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
910
+
911
+ return self.attention(
912
+ hidden_states=hidden_states,
913
+ attention_mask=attention_mask,
914
+ position_ids=position_ids,
915
+ past_key_value=past_key_value,
916
+ output_attentions=output_attentions,
917
+ use_cache=use_cache,
918
+ cache_position=cache_position,
919
+ position_embeddings=position_embeddings,
920
+ **kwargs,
921
+ )
922
+
923
+
924
+ class ExaoneGatedMLP(nn.Module):
925
+ def __init__(self, intermediate_size, config):
926
+ super().__init__()
927
+ self.config = config
928
+ embed_dim = config.hidden_size
929
+ self.c_fc_0 = nn.Linear(embed_dim, intermediate_size, bias=False)
930
+ self.c_fc_1 = nn.Linear(embed_dim, intermediate_size, bias=False)
931
+ self.c_proj = nn.Linear(intermediate_size, embed_dim, bias=False)
932
+ self.act = ACT2FN[config.activation_function]
933
+
934
+ def forward(self, hidden_states):
935
+ output_proj = self.c_proj(self.act(self.c_fc_0(hidden_states)) * self.c_fc_1(hidden_states))
936
+ return output_proj
937
+
938
+
939
+ class ExaoneBlock(nn.Module):
940
+ def __init__(self, config, layer_id):
941
+ super().__init__()
942
+ self.config = config
943
+ hidden_size = config.hidden_size
944
+ inner_dim = config.intermediate_size if config.intermediate_size is not None else 4 * hidden_size
945
+ self.ln_1 = ExaoneRMSNorm(hidden_size = hidden_size, eps=config.layer_norm_epsilon)
946
+ self.attn = ExaoneAttention(config, layer_id)
947
+ self.ln_2 = ExaoneRMSNorm(hidden_size = hidden_size, eps=config.layer_norm_epsilon)
948
+ self.mlp = ExaoneGatedMLP(inner_dim, config)
949
+
950
+ def forward(
951
+ self,
952
+ hidden_states: torch.Tensor,
953
+ attention_mask: Optional[torch.Tensor] = None,
954
+ position_ids: Optional[torch.LongTensor] = None,
955
+ past_key_value: Optional[Cache] = None,
956
+ output_attentions: Optional[bool] = False,
957
+ use_cache: Optional[bool] = False,
958
+ cache_position: Optional[torch.LongTensor] = None,
959
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None,
960
+ **kwargs,
961
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
962
+
963
+ residual = hidden_states
964
+ hidden_states = self.ln_1(hidden_states)
965
+
966
+ hidden_states, self_attn_weights, present_key_value = self.attn(
967
+ hidden_states=hidden_states,
968
+ attention_mask=attention_mask,
969
+ position_ids=position_ids,
970
+ past_key_value=past_key_value,
971
+ output_attentions=output_attentions,
972
+ use_cache=use_cache,
973
+ cache_position=cache_position,
974
+ position_embeddings=position_embeddings,
975
+ **kwargs,
976
+ )
977
+ # residual connection
978
+ hidden_states = residual + hidden_states
979
+
980
+ residual = hidden_states
981
+ hidden_states = self.ln_2(hidden_states)
982
+ hidden_states = self.mlp(hidden_states)
983
+
984
+ hidden_states = residual + hidden_states
985
+
986
+ outputs = (hidden_states,)
987
+
988
+ if output_attentions:
989
+ outputs += (self_attn_weights,)
990
+
991
+ if use_cache:
992
+ outputs += (present_key_value,)
993
+
994
+ return outputs
995
+
996
+
997
+ class ExaonePreTrainedModel(PreTrainedModel):
998
+ """
999
+ An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
1000
+ models.
1001
+ """
1002
+
1003
+ config_class = ExaoneConfig
1004
+ base_model_prefix = "transformer"
1005
+ supports_gradient_checkpointing = True
1006
+ _no_split_modules = ["ExaoneBlock"]
1007
+ _skip_keys_device_placement = "past_key_values"
1008
+ _supports_flash_attn_2 = True
1009
+ _supports_sdpa = True
1010
+ _supports_cache_class = True
1011
+
1012
+ def __init__(self, *inputs, **kwargs):
1013
+ super().__init__(*inputs, **kwargs)
1014
+
1015
+ def _init_weights(self, module):
1016
+ """Initialize the weights."""
1017
+ if isinstance(module, (nn.Linear,)):
1018
+ # Slightly different from the TF version which uses truncated_normal for initialization
1019
+ # cf https://github.com/pytorch/pytorch/pull/5617
1020
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
1021
+ if module.bias is not None:
1022
+ module.bias.data.zero_()
1023
+ elif isinstance(module, nn.Embedding):
1024
+ module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
1025
+ if module.padding_idx is not None:
1026
+ module.weight.data[module.padding_idx].zero_()
1027
+ elif isinstance(module, ExaoneRMSNorm):
1028
+ module.weight.data.fill_(1.0)
1029
+
1030
+
1031
+ EXAONE_START_DOCSTRING = r"""
1032
+
1033
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
1034
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
1035
+ etc.)
1036
+
1037
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
1038
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
1039
+ and behavior.
1040
+
1041
+ Parameters:
1042
+ config (:class:`~transformers.ExaoneConfig`): Model configuration class with all the parameters of the model.
1043
+ Initializing with a config file does not load the weights associated with the model, only the
1044
+ configuration. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model weights.
1045
+ """
1046
+
1047
+ EXAONE_INPUTS_DOCSTRING = r"""
1048
+ Args:
1049
+ input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, input_ids_length)`):
1050
+ :obj:`input_ids_length` = ``sequence_length`` if :obj:`past_key_values` is ``None`` else
1051
+ ``past_key_values.get_seq_length()`` (``sequence_length`` of input past key value states). Indices of input
1052
+ sequence tokens in the vocabulary.
1053
+
1054
+ If :obj:`past_key_values` is used, only ``input_ids`` that do not have their past calculated should be
1055
+ passed as ``input_ids``.
1056
+
1057
+ `What are input IDs? <../glossary.html#input-ids>`__
1058
+ attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
1059
+ Mask to avoid performing attention on padding token indices. Mask values selected in ``[0, 1]``:
1060
+
1061
+ - 1 for tokens that are **not masked**,
1062
+ - 0 for tokens that are **masked**.
1063
+
1064
+ `What are attention masks? <../glossary.html#attention-mask>`__
1065
+ position_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
1066
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range ``[0,
1067
+ config.max_position_embeddings - 1]``.
1068
+
1069
+ `What are position IDs? <../glossary.html#position-ids>`_
1070
+ past_key_values (:obj:`Cache`, `optional`):
1071
+ Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (see
1072
+ :obj:`past_key_values` output below). Can be used to speed up sequential decoding. This typically consists
1073
+ in the `past_key_values` returned by the model at a previous stage of decoding, when `use_cache=True` or
1074
+ `config.use_cache=True`.
1075
+ inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
1076
+ Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
1077
+ This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
1078
+ vectors than the model's internal embedding lookup matrix.
1079
+
1080
+ If :obj:`past_key_values` is used, optionally only the last :obj:`inputs_embeds` have to be input (see
1081
+ :obj:`past_key_values`).
1082
+ use_cache (:obj:`bool`, `optional`):
1083
+ If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
1084
+ decoding (see :obj:`past_key_values`).
1085
+ output_attentions (:obj:`bool`, `optional`):
1086
+ Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
1087
+ tensors for more detail.
1088
+ output_hidden_states (:obj:`bool`, `optional`):
1089
+ Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
1090
+ more detail.
1091
+ return_dict (:obj:`bool`, `optional`):
1092
+ Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
1093
+ cache_position (:obj:`torch.LongTensor` of shape :obj:`(sequence_length)`, `optional`):
1094
+ Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
1095
+ this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
1096
+ the complete sequence length.
1097
+ """
1098
+
1099
+
1100
+ @add_start_docstrings(
1101
+ "The bare EXAONE Model transformer outputting raw hidden-states without any specific head on top.",
1102
+ EXAONE_START_DOCSTRING,
1103
+ )
1104
+ class ExaoneModel(ExaonePreTrainedModel):
1105
+ def __init__(self, config):
1106
+ super().__init__(config)
1107
+ self.config = config
1108
+ self.embed_dim = config.hidden_size
1109
+ self.wte = nn.Embedding(config.vocab_size, self.embed_dim, self.config.pad_token_id)
1110
+ self.drop = nn.Dropout(float(config.embed_dropout))
1111
+ self.h = nn.ModuleList([ExaoneBlock(config, layer_id=i) for i in range(config.num_layers)])
1112
+ self.ln_f = ExaoneRMSNorm(hidden_size=self.embed_dim, eps=config.layer_norm_epsilon)
1113
+ self.rotary = ExaoneRotaryEmbedding(config)
1114
+ self.gradient_checkpointing = False
1115
+ # Initialize weights and apply final processing
1116
+ self.post_init()
1117
+
1118
+ def get_input_embeddings(self):
1119
+ return self.wte
1120
+
1121
+ def set_input_embeddings(self, new_embeddings):
1122
+ self.wte = new_embeddings
1123
+
1124
+ @add_start_docstrings_to_model_forward(EXAONE_INPUTS_DOCSTRING)
1125
+ @add_code_sample_docstrings(
1126
+ checkpoint=_CHECKPOINT_FOR_DOC,
1127
+ output_type=BaseModelOutputWithPastAndCrossAttentions,
1128
+ config_class=_CONFIG_FOR_DOC,
1129
+ )
1130
+ def forward(
1131
+ self,
1132
+ input_ids: Optional[torch.Tensor] = None,
1133
+ attention_mask: Optional[torch.Tensor] = None,
1134
+ position_ids: Optional[torch.Tensor] = None,
1135
+ past_key_values: Optional[Cache] = None,
1136
+ inputs_embeds: Optional[torch.Tensor] = None,
1137
+ use_cache: Optional[bool] = None,
1138
+ output_attentions: Optional[bool] = None,
1139
+ output_hidden_states: Optional[bool] = None,
1140
+ return_dict: Optional[bool] = None,
1141
+ cache_position: Optional[torch.LongTensor] = None,
1142
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPast]:
1143
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
1144
+ output_hidden_states = (
1145
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
1146
+ )
1147
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
1148
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1149
+
1150
+ if self.gradient_checkpointing and self.training:
1151
+ if use_cache:
1152
+ logger.warning_once(
1153
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
1154
+ )
1155
+ use_cache = False
1156
+
1157
+ if input_ids is not None and inputs_embeds is not None:
1158
+ raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
1159
+ elif input_ids is not None:
1160
+ batch_size, seq_length = input_ids.shape[:2]
1161
+ elif inputs_embeds is not None:
1162
+ batch_size, seq_length = inputs_embeds.shape[:2]
1163
+ else:
1164
+ raise ValueError("You have to specify either input_ids or inputs_embeds")
1165
+
1166
+ return_legacy_cache = False
1167
+ if (
1168
+ use_cache and not isinstance(past_key_values, Cache) and not self.training
1169
+ ): # kept for BC (non `Cache` `past_key_values` inputs)
1170
+ return_legacy_cache = True
1171
+ past_key_values = DynamicCache.from_legacy_cache(past_key_values)
1172
+ logger.warning_once(
1173
+ "We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. "
1174
+ "Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)"
1175
+ )
1176
+
1177
+ if inputs_embeds is None:
1178
+ inputs_embeds = self.wte(input_ids)
1179
+
1180
+ if cache_position is None:
1181
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
1182
+ cache_position = torch.arange(
1183
+ past_seen_tokens, past_seen_tokens + inputs_embeds.shape[1], device=inputs_embeds.device
1184
+ )
1185
+ if position_ids is None:
1186
+ position_ids = cache_position.unsqueeze(0)
1187
+
1188
+ causal_mask = self._update_causal_mask(
1189
+ attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
1190
+ )
1191
+
1192
+ hidden_states = inputs_embeds
1193
+ hidden_states = self.drop(hidden_states)
1194
+
1195
+ position_embeddings = self.rotary(hidden_states, position_ids)
1196
+
1197
+ all_hidden_states = () if output_hidden_states else None
1198
+ all_self_attns = () if output_attentions else None
1199
+ next_decoder_cache = None
1200
+
1201
+ for block in self.h:
1202
+ if output_hidden_states:
1203
+ all_hidden_states = all_hidden_states + (hidden_states,)
1204
+
1205
+ if self.gradient_checkpointing and self.training:
1206
+ outputs = self._gradient_checkpointing_func(
1207
+ block.__call__,
1208
+ hidden_states,
1209
+ causal_mask,
1210
+ position_ids,
1211
+ past_key_values,
1212
+ output_attentions,
1213
+ use_cache,
1214
+ cache_position,
1215
+ position_embeddings,
1216
+ )
1217
+ else:
1218
+ outputs = block(
1219
+ hidden_states,
1220
+ attention_mask=causal_mask,
1221
+ position_ids=position_ids,
1222
+ past_key_value=past_key_values,
1223
+ output_attentions=output_attentions,
1224
+ use_cache=use_cache,
1225
+ cache_position=cache_position,
1226
+ position_embeddings=position_embeddings,
1227
+ )
1228
+
1229
+ hidden_states = outputs[0]
1230
+ if use_cache:
1231
+ next_decoder_cache = outputs[2 if output_attentions else 1]
1232
+
1233
+ if output_attentions:
1234
+ all_self_attns += (outputs[1],)
1235
+
1236
+ hidden_states = self.ln_f(hidden_states)
1237
+ # Add last hidden state
1238
+ if output_hidden_states:
1239
+ all_hidden_states += (hidden_states,)
1240
+
1241
+ next_cache = None
1242
+ if use_cache:
1243
+ next_cache = next_decoder_cache.to_legacy_cache() if return_legacy_cache else next_decoder_cache
1244
+ if not return_dict:
1245
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
1246
+
1247
+ return BaseModelOutputWithPast(
1248
+ last_hidden_state=hidden_states,
1249
+ past_key_values=next_cache,
1250
+ hidden_states=all_hidden_states,
1251
+ attentions=all_self_attns,
1252
+ )
1253
+
1254
+ # copied from llama
1255
+ def _update_causal_mask(
1256
+ self,
1257
+ attention_mask: torch.Tensor,
1258
+ input_tensor: torch.Tensor,
1259
+ cache_position: torch.Tensor,
1260
+ past_key_values: Cache,
1261
+ output_attentions: bool,
1262
+ ):
1263
+ # TODO: As of torch==2.2.0, the `attention_mask` passed to the model in `generate` is 2D and of dynamic length even when the static
1264
+ # KV cache is used. This is an issue for torch.compile which then recaptures cudagraphs at each decode steps due to the dynamic shapes.
1265
+ # (`recording cudagraph tree for symint key 13`, etc.), which is VERY slow. A workaround is `@torch.compiler.disable`, but this prevents using
1266
+ # `fullgraph=True`. See more context in https://github.com/huggingface/transformers/pull/29114
1267
+
1268
+ if self.config._attn_implementation == "flash_attention_2":
1269
+ if attention_mask is not None and 0.0 in attention_mask:
1270
+ return attention_mask
1271
+ return None
1272
+
1273
+ # For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
1274
+ # order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
1275
+ # to infer the attention mask.
1276
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
1277
+ using_static_cache = isinstance(past_key_values, StaticCache)
1278
+
1279
+ # When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
1280
+ if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
1281
+ if AttentionMaskConverter._ignore_causal_mask_sdpa(
1282
+ attention_mask,
1283
+ inputs_embeds=input_tensor,
1284
+ past_key_values_length=past_seen_tokens,
1285
+ is_training=self.training,
1286
+ ):
1287
+ return None
1288
+
1289
+ dtype, device = input_tensor.dtype, input_tensor.device
1290
+ min_dtype = torch.finfo(dtype).min
1291
+ sequence_length = input_tensor.shape[1]
1292
+ if using_static_cache:
1293
+ target_length = past_key_values.get_max_length()
1294
+ else:
1295
+ target_length = (
1296
+ attention_mask.shape[-1]
1297
+ if isinstance(attention_mask, torch.Tensor)
1298
+ else past_seen_tokens + sequence_length + 1
1299
+ )
1300
+
1301
+ # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
1302
+ causal_mask = _prepare_4d_causal_attention_mask_with_cache_position(
1303
+ attention_mask,
1304
+ sequence_length=sequence_length,
1305
+ target_length=target_length,
1306
+ dtype=dtype,
1307
+ device=device,
1308
+ min_dtype=min_dtype,
1309
+ cache_position=cache_position,
1310
+ batch_size=input_tensor.shape[0],
1311
+ )
1312
+
1313
+ if (
1314
+ self.config._attn_implementation == "sdpa"
1315
+ and attention_mask is not None
1316
+ and attention_mask.device.type == "cuda"
1317
+ and not output_attentions
1318
+ ):
1319
+ # Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
1320
+ # using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
1321
+ # Details: https://github.com/pytorch/pytorch/issues/110213
1322
+ causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
1323
+
1324
+ return causal_mask
1325
+
1326
+
1327
+ @add_start_docstrings(
1328
+ """
1329
+ The EXAONE Model transformer with a language modeling head on top (linear layer with weights tied to the input
1330
+ embeddings).
1331
+ """,
1332
+ EXAONE_START_DOCSTRING,
1333
+ )
1334
+ class ExaoneForCausalLM(ExaonePreTrainedModel):
1335
+
1336
+ def __init__(self, config):
1337
+ super().__init__(config)
1338
+ self.transformer = ExaoneModel(config)
1339
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
1340
+ self.config = config
1341
+ # Initialize weights and apply final processing
1342
+ self.post_init()
1343
+
1344
+ def get_output_embeddings(self):
1345
+ return self.lm_head
1346
+
1347
+ def set_output_embeddings(self, new_embeddings):
1348
+ self.lm_head = new_embeddings
1349
+
1350
+ @add_start_docstrings_to_model_forward(EXAONE_INPUTS_DOCSTRING)
1351
+ @add_code_sample_docstrings(
1352
+ checkpoint=_CHECKPOINT_FOR_DOC,
1353
+ output_type=BaseModelOutputWithPast,
1354
+ config_class=_CONFIG_FOR_DOC,
1355
+ )
1356
+ def forward(
1357
+ self,
1358
+ input_ids: Optional[torch.Tensor] = None,
1359
+ attention_mask: Optional[torch.Tensor] = None,
1360
+ position_ids: Optional[torch.Tensor] = None,
1361
+ past_key_values: Optional[Cache] = None,
1362
+ inputs_embeds: Optional[torch.Tensor] = None,
1363
+ labels: Optional[torch.Tensor] = None,
1364
+ use_cache: Optional[bool] = None,
1365
+ output_attentions: Optional[bool] = None,
1366
+ output_hidden_states: Optional[bool] = None,
1367
+ return_dict: Optional[bool] = None,
1368
+ cache_position: Optional[torch.LongTensor] = None,
1369
+ ) -> Union[Tuple[torch.Tensor], BaseModelOutputWithPast]:
1370
+ r"""
1371
+ Args:
1372
+ labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
1373
+ Labels for language modeling. Note that the labels **are shifted** inside the model, i.e. you can set
1374
+ `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`
1375
+ are ignored (masked), the loss is only computed for labels in `[0, ..., config.vocab_size]`
1376
+
1377
+ Example:
1378
+
1379
+ ```python
1380
+ >>> from transformers import AutoModelForCausalLM, AutoTokenizer
1381
+
1382
+ >>> model = AutoModelForCausalLM.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct",
1383
+ trust_remote_code=True)
1384
+ >>> tokenizer = AutoTokenizer.from_pretrained("LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct")
1385
+
1386
+ >>> prompt = "Explain how wonderful you are"
1387
+ >>> messages = [
1388
+ {"role": "system", "content": "You are a helpful assistant."},
1389
+ {"role": "user", "content": prompt}
1390
+ ]
1391
+ >>> input_ids = tokenizer.apply_chat_template(
1392
+ messages,
1393
+ tokenize=True,
1394
+ add_generation_prompt=True,
1395
+ return_tensors="pt"
1396
+ )
1397
+
1398
+ >>> output = model.generate(input_ids, max_new_tokens=128)
1399
+ >>> tokenizer.decode(output[0], skip_special_tokens=True)
1400
+ "[|system|]You are a helpful assistant.\n[|user|]Explain how wonderful you are\n[|assistant|]Thank you for your kind words! I'm here to assist you with information, answer questions, and help you in any way I can. My goal is to provide accurate, helpful, and timely responses. Whether you need help with a specific task, want to learn something new, or just need someone to talk to, I'm here for you. How can I assist you today?"
1401
+ ```
1402
+ """
1403
+
1404
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
1405
+ output_hidden_states = output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
1406
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1407
+ transformer_outputs = self.transformer(
1408
+ input_ids,
1409
+ attention_mask=attention_mask,
1410
+ past_key_values=past_key_values,
1411
+ position_ids=position_ids,
1412
+ inputs_embeds=inputs_embeds,
1413
+ use_cache=use_cache,
1414
+ output_attentions=output_attentions,
1415
+ output_hidden_states=output_hidden_states,
1416
+ return_dict=return_dict,
1417
+ cache_position=cache_position,
1418
+ )
1419
+ hidden_states = transformer_outputs[0]
1420
+ lm_logits = self.lm_head(hidden_states)
1421
+ lm_logits = lm_logits.float()
1422
+ loss = None
1423
+ if labels is not None:
1424
+ lm_logits = lm_logits.to(torch.float32)
1425
+
1426
+ # Shift so that tokens < n predict n
1427
+ shift_logits = lm_logits[..., :-1, :].contiguous()
1428
+ shift_labels = labels[..., 1:].contiguous()
1429
+ # Flatten the tokens
1430
+ loss_fct = CrossEntropyLoss()
1431
+ loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
1432
+
1433
+ lm_logits = lm_logits.to(hidden_states.dtype)
1434
+ loss = loss.to(hidden_states.dtype)
1435
+
1436
+ if not return_dict:
1437
+ output = (lm_logits,) + transformer_outputs[1:]
1438
+ return ((loss,) + output) if loss is not None else output
1439
+
1440
+ return CausalLMOutputWithPast(
1441
+ loss=loss,
1442
+ logits=lm_logits,
1443
+ past_key_values=transformer_outputs.past_key_values,
1444
+ hidden_states=transformer_outputs.hidden_states,
1445
+ attentions=transformer_outputs.attentions,
1446
+ )
1447
+
1448
+ def prepare_inputs_for_generation(
1449
+ self,
1450
+ input_ids,
1451
+ past_key_values=None,
1452
+ attention_mask=None,
1453
+ inputs_embeds=None,
1454
+ cache_position=None,
1455
+ position_ids=None,
1456
+ use_cache=True,
1457
+ **kwargs,
1458
+ ):
1459
+ # If we have cache: let's slice `input_ids` through `cache_position`, to keep only the unprocessed tokens
1460
+ # Exception 1: when passing input_embeds, input_ids may be missing entries
1461
+ # Exception 2: some generation methods do special slicing of input_ids, so we don't need to do it here
1462
+ if past_key_values is not None:
1463
+ if inputs_embeds is not None: # Exception 1
1464
+ input_ids = input_ids[:, -cache_position.shape[0] :]
1465
+ elif input_ids.shape[1] != cache_position.shape[0]: # Default case (the "else", a no op, is Exception 2)
1466
+ input_ids = input_ids[:, cache_position]
1467
+
1468
+ if attention_mask is not None and position_ids is None:
1469
+ # create position_ids on the fly for batch generation
1470
+ position_ids = attention_mask.long().cumsum(-1) - 1
1471
+ position_ids.masked_fill_(attention_mask == 0, 1)
1472
+ if past_key_values:
1473
+ position_ids = position_ids[:, -input_ids.shape[1] :]
1474
+
1475
+ # This `clone` call is needed to avoid recapturing cuda graphs with `torch.compile`'s `mode="reduce-overhead`, as otherwise the input `position_ids` would have various stride during the decoding. Here, simply using `.contiguous()` is not sufficient as in the batch size = 1 case, `position_ids` is already contiguous but with varying stride which retriggers a capture.
1476
+ position_ids = position_ids.clone(memory_format=torch.contiguous_format)
1477
+
1478
+ # if `inputs_embeds` are passed, we only want to use them in the 1st generation step
1479
+ if inputs_embeds is not None and cache_position[0] == 0:
1480
+ model_inputs = {"inputs_embeds": inputs_embeds}
1481
+ else:
1482
+ model_inputs = {"input_ids": input_ids}
1483
+
1484
+ if isinstance(past_key_values, StaticCache) and attention_mask.ndim == 2:
1485
+ if inputs_embeds is not None:
1486
+ batch_size, sequence_length = inputs_embeds.shape
1487
+ device = inputs_embeds.device
1488
+ else:
1489
+ batch_size, sequence_length = input_ids.shape
1490
+ device = input_ids.device
1491
+
1492
+ dtype = self.lm_head.weight.dtype
1493
+ min_dtype = torch.finfo(dtype).min
1494
+
1495
+ attention_mask = _prepare_4d_causal_attention_mask_with_cache_position(
1496
+ attention_mask,
1497
+ sequence_length=sequence_length,
1498
+ target_length=past_key_values.get_max_length(),
1499
+ dtype=dtype,
1500
+ device=device,
1501
+ min_dtype=min_dtype,
1502
+ cache_position=cache_position,
1503
+ batch_size=batch_size,
1504
+ )
1505
+
1506
+ model_inputs.update(
1507
+ {
1508
+ "position_ids": position_ids,
1509
+ "cache_position": cache_position,
1510
+ "past_key_values": past_key_values,
1511
+ "use_cache": use_cache,
1512
+ "attention_mask": attention_mask,
1513
+ }
1514
+ )
1515
+ return model_inputs
1516
+
1517
+ @staticmethod
1518
+ def _reorder_cache(past_key_values, beam_idx):
1519
+ reordered_past = ()
1520
+ for layer_past in past_key_values:
1521
+ reordered_past += (
1522
+ tuple(past_state.index_select(0, beam_idx.to(past_state.device)) for past_state in layer_past),
1523
+ )
1524
+ return reordered_past
1525
+
1526
+
1527
+ @add_start_docstrings(
1528
+ """
1529
+ The EXAONE Model transformer with a sequence classification head on top (linear layer).
1530
+
1531
+ :class:`~transformers.ExaoneForSequenceClassification` uses the last token in order to do the classification, as
1532
+ other causal models (e.g. GPT-1) do.
1533
+
1534
+ Since it does classification on the last token, it requires to know the position of the last token. If a
1535
+ :obj:`pad_token_id` is defined in the configuration, it finds the last token that is not a padding token in each
1536
+ row. If no :obj:`pad_token_id` is defined, it simply takes the last value in each row of the batch. Since it cannot
1537
+ guess the padding tokens when :obj:`inputs_embeds` are passed instead of :obj:`input_ids`, it does the same (take
1538
+ the last value in each row of the batch).
1539
+ """,
1540
+ EXAONE_START_DOCSTRING,
1541
+ )
1542
+ class ExaoneForSequenceClassification(ExaonePreTrainedModel):
1543
+ _keys_to_ignore_on_load_missing = ["lm_head.weight"]
1544
+ def __init__(self, config):
1545
+ super().__init__(config)
1546
+ self.num_labels = config.num_labels
1547
+ self.transformer = ExaoneModel(config)
1548
+ self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
1549
+
1550
+ # Initialize weights and apply final processing
1551
+ self.post_init()
1552
+
1553
+ @add_start_docstrings_to_model_forward(EXAONE_INPUTS_DOCSTRING)
1554
+ @add_code_sample_docstrings(
1555
+ checkpoint=_CHECKPOINT_FOR_DOC,
1556
+ output_type=SequenceClassifierOutputWithPast,
1557
+ config_class=_CONFIG_FOR_DOC,
1558
+ )
1559
+ def forward(
1560
+ self,
1561
+ input_ids: Optional[torch.Tensor] = None,
1562
+ attention_mask: Optional[torch.Tensor] = None,
1563
+ position_ids: Optional[torch.Tensor] = None,
1564
+ past_key_values: Optional[Cache] = None,
1565
+ inputs_embeds: Optional[torch.Tensor] = None,
1566
+ labels: Optional[torch.Tensor] = None,
1567
+ use_cache: Optional[bool] = None,
1568
+ output_attentions: Optional[bool] = None,
1569
+ output_hidden_states: Optional[bool] = None,
1570
+ return_dict: Optional[bool] = None,
1571
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutputWithPast]:
1572
+ r"""
1573
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
1574
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
1575
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
1576
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
1577
+ """
1578
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1579
+
1580
+ transformer_outputs = self.transformer(
1581
+ input_ids,
1582
+ attention_mask=attention_mask,
1583
+ position_ids=position_ids,
1584
+ past_key_values=past_key_values,
1585
+ inputs_embeds=inputs_embeds,
1586
+ use_cache=use_cache,
1587
+ output_attentions=output_attentions,
1588
+ output_hidden_states=output_hidden_states,
1589
+ return_dict=return_dict,
1590
+ )
1591
+ hidden_states = transformer_outputs[0]
1592
+ logits = self.score(hidden_states)
1593
+
1594
+ if input_ids is not None:
1595
+ batch_size, sequence_length = input_ids.shape[:2]
1596
+ else:
1597
+ batch_size, sequence_length = inputs_embeds.shape[:2]
1598
+
1599
+ if self.config.pad_token_id is None and batch_size != 1:
1600
+ raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
1601
+ if self.config.pad_token_id is None:
1602
+ sequence_lengths = -1
1603
+ else:
1604
+ if input_ids is not None:
1605
+ # if no pad token found, use modulo instead of reverse indexing for ONNX compatibility
1606
+ sequence_lengths = torch.ne(input_ids, self.config.pad_token_id).sum(-1) - 1
1607
+ sequence_lengths = sequence_lengths % input_ids.shape[-1]
1608
+ sequence_lengths = sequence_lengths.to(logits.device)
1609
+ else:
1610
+ sequence_lengths = -1
1611
+ logger.warning(
1612
+ f"{self.__class__.__name__} will not detect padding tokens in `inputs_embeds`. Results may be "
1613
+ "unexpected if using padding tokens in conjunction with `inputs_embeds.`"
1614
+ )
1615
+
1616
+ pooled_logits = logits[torch.arange(batch_size, device=logits.device), sequence_lengths]
1617
+
1618
+ loss = None
1619
+ if labels is not None:
1620
+ labels = labels.to(logits.device)
1621
+ if self.config.problem_type is None:
1622
+ if self.num_labels == 1:
1623
+ self.config.problem_type = "regression"
1624
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
1625
+ self.config.problem_type = "single_label_classification"
1626
+ else:
1627
+ self.config.problem_type = "multi_label_classification"
1628
+
1629
+ if self.config.problem_type == "regression":
1630
+ loss_fct = MSELoss()
1631
+ if self.num_labels == 1:
1632
+ loss = loss_fct(pooled_logits.squeeze(), labels.squeeze())
1633
+ else:
1634
+ loss = loss_fct(pooled_logits, labels)
1635
+ elif self.config.problem_type == "single_label_classification":
1636
+ loss_fct = CrossEntropyLoss()
1637
+ loss = loss_fct(pooled_logits.view(-1, self.num_labels), labels.view(-1))
1638
+ elif self.config.problem_type == "multi_label_classification":
1639
+ loss_fct = BCEWithLogitsLoss()
1640
+ loss = loss_fct(pooled_logits, labels)
1641
+ if not return_dict:
1642
+ output = (pooled_logits,) + transformer_outputs[1:]
1643
+ return ((loss,) + output) if loss is not None else output
1644
+
1645
+ return SequenceClassifierOutputWithPast(
1646
+ loss=loss,
1647
+ logits=pooled_logits,
1648
+ past_key_values=transformer_outputs.past_key_values,
1649
+ hidden_states=transformer_outputs.hidden_states,
1650
+ attentions=transformer_outputs.attentions,
1651
+ )
1652
+
1653
+
1654
+ @add_start_docstrings(
1655
+ """
1656
+ The EXAONE Model transformer with a span classification head on top for extractive question-answering tasks like
1657
+ SQuAD (a linear layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
1658
+ """,
1659
+ EXAONE_START_DOCSTRING,
1660
+ )
1661
+ class ExaoneForQuestionAnswering(ExaonePreTrainedModel):
1662
+ _keys_to_ignore_on_load_missing = ["lm_head.weight"]
1663
+
1664
+ def __init__(self, config):
1665
+ super().__init__(config)
1666
+ self.num_labels = config.num_labels
1667
+ self.transformer = ExaoneModel(config)
1668
+ self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
1669
+
1670
+ # Model parallel
1671
+ self.model_parallel = False
1672
+ self.device_map = None
1673
+
1674
+ # Initialize weights and apply final processing
1675
+ self.post_init()
1676
+
1677
+ def forward(
1678
+ self,
1679
+ input_ids: Optional[torch.LongTensor] = None,
1680
+ attention_mask: Optional[torch.FloatTensor] = None,
1681
+ position_ids: Optional[torch.LongTensor] = None,
1682
+ past_key_values: Optional[Cache] = None,
1683
+ inputs_embeds: Optional[torch.FloatTensor] = None,
1684
+ start_positions: Optional[torch.LongTensor] = None,
1685
+ end_positions: Optional[torch.LongTensor] = None,
1686
+ output_attentions: Optional[bool] = None,
1687
+ output_hidden_states: Optional[bool] = None,
1688
+ return_dict: Optional[bool] = None,
1689
+ ) -> Union[Tuple[torch.Tensor], QuestionAnsweringModelOutput]:
1690
+ r"""
1691
+ start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
1692
+ Labels for position (index) of the start of the labelled span for computing the token classification loss.
1693
+ Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the
1694
+ sequence are not taken into account for computing the loss.
1695
+ end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
1696
+ Labels for position (index) of the end of the labelled span for computing the token classification loss.
1697
+ Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the
1698
+ sequence are not taken into account for computing the loss.
1699
+ """
1700
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
1701
+
1702
+ outputs = self.transformer(
1703
+ input_ids,
1704
+ attention_mask=attention_mask,
1705
+ position_ids=position_ids,
1706
+ past_key_values=past_key_values,
1707
+ inputs_embeds=inputs_embeds,
1708
+ output_attentions=output_attentions,
1709
+ output_hidden_states=output_hidden_states,
1710
+ return_dict=return_dict,
1711
+ )
1712
+
1713
+ sequence_output = outputs[0]
1714
+
1715
+ logits = self.qa_outputs(sequence_output)
1716
+ start_logits, end_logits = logits.split(1, dim=-1)
1717
+ start_logits = start_logits.squeeze(-1).contiguous()
1718
+ end_logits = end_logits.squeeze(-1).contiguous()
1719
+
1720
+ total_loss = None
1721
+ if start_positions is not None and end_positions is not None:
1722
+ # If we are on multi-GPU, split add a dimension
1723
+ if len(start_positions.size()) > 1:
1724
+ start_positions = start_positions.squeeze(-1).to(start_logits.device)
1725
+ if len(end_positions.size()) > 1:
1726
+ end_positions = end_positions.squeeze(-1).to(end_logits.device)
1727
+ # sometimes the start/end positions are outside our model inputs, we ignore these terms
1728
+ ignored_index = start_logits.size(1)
1729
+ start_positions = start_positions.clamp(0, ignored_index)
1730
+ end_positions = end_positions.clamp(0, ignored_index)
1731
+
1732
+ loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
1733
+ start_loss = loss_fct(start_logits, start_positions)
1734
+ end_loss = loss_fct(end_logits, end_positions)
1735
+ total_loss = (start_loss + end_loss) / 2
1736
+
1737
+ if not return_dict:
1738
+ output = (start_logits, end_logits) + outputs[2:]
1739
+ return ((total_loss,) + output) if total_loss is not None else output
1740
+
1741
+ return QuestionAnsweringModelOutput(
1742
+ loss=total_loss,
1743
+ start_logits=start_logits,
1744
+ end_logits=end_logits,
1745
+ hidden_states=outputs.hidden_states,
1746
+ attentions=outputs.attentions,
1747
+ )
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[BOS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "[|endofturn|]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "[UNK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,3221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[BOS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[EOS]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "4": {
37
+ "content": " ",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": false
43
+ },
44
+ "5": {
45
+ "content": " ",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "6": {
53
+ "content": " ",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "7": {
61
+ "content": " ",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "8": {
69
+ "content": " ",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": false
75
+ },
76
+ "9": {
77
+ "content": " ",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": false
83
+ },
84
+ "10": {
85
+ "content": " ",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "11": {
93
+ "content": " ",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "12": {
101
+ "content": " ",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "13": {
109
+ "content": " ",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "14": {
117
+ "content": " ",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "15": {
125
+ "content": " ",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "16": {
133
+ "content": " ",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "17": {
141
+ "content": " ",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "18": {
149
+ "content": " ",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "19": {
157
+ "content": " ",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "20": {
165
+ "content": " ",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": false
171
+ },
172
+ "21": {
173
+ "content": " ",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": false
179
+ },
180
+ "22": {
181
+ "content": " ",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": false
187
+ },
188
+ "23": {
189
+ "content": " ",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": false
195
+ },
196
+ "24": {
197
+ "content": " ",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": false
203
+ },
204
+ "25": {
205
+ "content": " ",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": false
211
+ },
212
+ "26": {
213
+ "content": " ",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": false
219
+ },
220
+ "27": {
221
+ "content": " ",
222
+ "lstrip": false,
223
+ "normalized": false,
224
+ "rstrip": false,
225
+ "single_word": false,
226
+ "special": false
227
+ },
228
+ "28": {
229
+ "content": " ",
230
+ "lstrip": false,
231
+ "normalized": false,
232
+ "rstrip": false,
233
+ "single_word": false,
234
+ "special": false
235
+ },
236
+ "29": {
237
+ "content": " ",
238
+ "lstrip": false,
239
+ "normalized": false,
240
+ "rstrip": false,
241
+ "single_word": false,
242
+ "special": false
243
+ },
244
+ "30": {
245
+ "content": " ",
246
+ "lstrip": false,
247
+ "normalized": false,
248
+ "rstrip": false,
249
+ "single_word": false,
250
+ "special": false
251
+ },
252
+ "31": {
253
+ "content": " ",
254
+ "lstrip": false,
255
+ "normalized": false,
256
+ "rstrip": false,
257
+ "single_word": false,
258
+ "special": false
259
+ },
260
+ "32": {
261
+ "content": " ",
262
+ "lstrip": false,
263
+ "normalized": false,
264
+ "rstrip": false,
265
+ "single_word": false,
266
+ "special": false
267
+ },
268
+ "33": {
269
+ "content": " ",
270
+ "lstrip": false,
271
+ "normalized": false,
272
+ "rstrip": false,
273
+ "single_word": false,
274
+ "special": false
275
+ },
276
+ "34": {
277
+ "content": "\t\t\t\t\t\t\t\t\t",
278
+ "lstrip": false,
279
+ "normalized": false,
280
+ "rstrip": false,
281
+ "single_word": false,
282
+ "special": false
283
+ },
284
+ "35": {
285
+ "content": "\t\t\t\t\t\t\t\t",
286
+ "lstrip": false,
287
+ "normalized": false,
288
+ "rstrip": false,
289
+ "single_word": false,
290
+ "special": false
291
+ },
292
+ "36": {
293
+ "content": "\t\t\t\t\t\t\t",
294
+ "lstrip": false,
295
+ "normalized": false,
296
+ "rstrip": false,
297
+ "single_word": false,
298
+ "special": false
299
+ },
300
+ "37": {
301
+ "content": "\t\t\t\t\t\t",
302
+ "lstrip": false,
303
+ "normalized": false,
304
+ "rstrip": false,
305
+ "single_word": false,
306
+ "special": false
307
+ },
308
+ "38": {
309
+ "content": "\t\t\t\t\t",
310
+ "lstrip": false,
311
+ "normalized": false,
312
+ "rstrip": false,
313
+ "single_word": false,
314
+ "special": false
315
+ },
316
+ "39": {
317
+ "content": "\t\t\t\t",
318
+ "lstrip": false,
319
+ "normalized": false,
320
+ "rstrip": false,
321
+ "single_word": false,
322
+ "special": false
323
+ },
324
+ "40": {
325
+ "content": "\t\t\t",
326
+ "lstrip": false,
327
+ "normalized": false,
328
+ "rstrip": false,
329
+ "single_word": false,
330
+ "special": false
331
+ },
332
+ "41": {
333
+ "content": "\t\t",
334
+ "lstrip": false,
335
+ "normalized": false,
336
+ "rstrip": false,
337
+ "single_word": false,
338
+ "special": false
339
+ },
340
+ "42": {
341
+ "content": "<|endoftext|>",
342
+ "lstrip": false,
343
+ "normalized": false,
344
+ "rstrip": false,
345
+ "single_word": false,
346
+ "special": true
347
+ },
348
+ "43": {
349
+ "content": "<|c|>",
350
+ "lstrip": false,
351
+ "normalized": false,
352
+ "rstrip": false,
353
+ "single_word": false,
354
+ "special": true
355
+ },
356
+ "44": {
357
+ "content": "<|c++|>",
358
+ "lstrip": false,
359
+ "normalized": false,
360
+ "rstrip": false,
361
+ "single_word": false,
362
+ "special": true
363
+ },
364
+ "45": {
365
+ "content": "<|python|>",
366
+ "lstrip": false,
367
+ "normalized": false,
368
+ "rstrip": false,
369
+ "single_word": false,
370
+ "special": true
371
+ },
372
+ "46": {
373
+ "content": "<|javascript|>",
374
+ "lstrip": false,
375
+ "normalized": false,
376
+ "rstrip": false,
377
+ "single_word": false,
378
+ "special": true
379
+ },
380
+ "47": {
381
+ "content": "<|markdown|>",
382
+ "lstrip": false,
383
+ "normalized": false,
384
+ "rstrip": false,
385
+ "single_word": false,
386
+ "special": true
387
+ },
388
+ "48": {
389
+ "content": "<|html|>",
390
+ "lstrip": false,
391
+ "normalized": false,
392
+ "rstrip": false,
393
+ "single_word": false,
394
+ "special": true
395
+ },
396
+ "49": {
397
+ "content": "<|css|>",
398
+ "lstrip": false,
399
+ "normalized": false,
400
+ "rstrip": false,
401
+ "single_word": false,
402
+ "special": true
403
+ },
404
+ "50": {
405
+ "content": "<|vue|>",
406
+ "lstrip": false,
407
+ "normalized": false,
408
+ "rstrip": false,
409
+ "single_word": false,
410
+ "special": true
411
+ },
412
+ "51": {
413
+ "content": "<|java|>",
414
+ "lstrip": false,
415
+ "normalized": false,
416
+ "rstrip": false,
417
+ "single_word": false,
418
+ "special": true
419
+ },
420
+ "52": {
421
+ "content": "PI:URL",
422
+ "lstrip": false,
423
+ "normalized": false,
424
+ "rstrip": false,
425
+ "single_word": false,
426
+ "special": true
427
+ },
428
+ "53": {
429
+ "content": "PI:EMAIL",
430
+ "lstrip": false,
431
+ "normalized": false,
432
+ "rstrip": false,
433
+ "single_word": false,
434
+ "special": true
435
+ },
436
+ "54": {
437
+ "content": "PI:ACCOUNT_NUM",
438
+ "lstrip": false,
439
+ "normalized": false,
440
+ "rstrip": false,
441
+ "single_word": false,
442
+ "special": true
443
+ },
444
+ "55": {
445
+ "content": "PI:PHONE_NUM",
446
+ "lstrip": false,
447
+ "normalized": false,
448
+ "rstrip": false,
449
+ "single_word": false,
450
+ "special": true
451
+ },
452
+ "56": {
453
+ "content": "PI:BUSINESS_NUM",
454
+ "lstrip": false,
455
+ "normalized": false,
456
+ "rstrip": false,
457
+ "single_word": false,
458
+ "special": true
459
+ },
460
+ "57": {
461
+ "content": "PI:ANNON",
462
+ "lstrip": false,
463
+ "normalized": false,
464
+ "rstrip": false,
465
+ "single_word": false,
466
+ "special": true
467
+ },
468
+ "58": {
469
+ "content": "PI:KEY",
470
+ "lstrip": false,
471
+ "normalized": false,
472
+ "rstrip": false,
473
+ "single_word": false,
474
+ "special": true
475
+ },
476
+ "59": {
477
+ "content": "PI:ID",
478
+ "lstrip": false,
479
+ "normalized": false,
480
+ "rstrip": false,
481
+ "single_word": false,
482
+ "special": true
483
+ },
484
+ "60": {
485
+ "content": "PI:IP_ADDRESS",
486
+ "lstrip": false,
487
+ "normalized": false,
488
+ "rstrip": false,
489
+ "single_word": false,
490
+ "special": true
491
+ },
492
+ "61": {
493
+ "content": "PI:USER",
494
+ "lstrip": false,
495
+ "normalized": false,
496
+ "rstrip": false,
497
+ "single_word": false,
498
+ "special": true
499
+ },
500
+ "62": {
501
+ "content": "[unused0]",
502
+ "lstrip": false,
503
+ "normalized": false,
504
+ "rstrip": false,
505
+ "single_word": false,
506
+ "special": true
507
+ },
508
+ "63": {
509
+ "content": "[unused1]",
510
+ "lstrip": false,
511
+ "normalized": false,
512
+ "rstrip": false,
513
+ "single_word": false,
514
+ "special": true
515
+ },
516
+ "64": {
517
+ "content": "[unused2]",
518
+ "lstrip": false,
519
+ "normalized": false,
520
+ "rstrip": false,
521
+ "single_word": false,
522
+ "special": true
523
+ },
524
+ "65": {
525
+ "content": "[unused3]",
526
+ "lstrip": false,
527
+ "normalized": false,
528
+ "rstrip": false,
529
+ "single_word": false,
530
+ "special": true
531
+ },
532
+ "66": {
533
+ "content": "[unused4]",
534
+ "lstrip": false,
535
+ "normalized": false,
536
+ "rstrip": false,
537
+ "single_word": false,
538
+ "special": true
539
+ },
540
+ "67": {
541
+ "content": "[unused5]",
542
+ "lstrip": false,
543
+ "normalized": false,
544
+ "rstrip": false,
545
+ "single_word": false,
546
+ "special": true
547
+ },
548
+ "68": {
549
+ "content": "[unused6]",
550
+ "lstrip": false,
551
+ "normalized": false,
552
+ "rstrip": false,
553
+ "single_word": false,
554
+ "special": true
555
+ },
556
+ "69": {
557
+ "content": "[unused7]",
558
+ "lstrip": false,
559
+ "normalized": false,
560
+ "rstrip": false,
561
+ "single_word": false,
562
+ "special": true
563
+ },
564
+ "70": {
565
+ "content": "[unused8]",
566
+ "lstrip": false,
567
+ "normalized": false,
568
+ "rstrip": false,
569
+ "single_word": false,
570
+ "special": true
571
+ },
572
+ "71": {
573
+ "content": "[unused9]",
574
+ "lstrip": false,
575
+ "normalized": false,
576
+ "rstrip": false,
577
+ "single_word": false,
578
+ "special": true
579
+ },
580
+ "72": {
581
+ "content": "[unused10]",
582
+ "lstrip": false,
583
+ "normalized": false,
584
+ "rstrip": false,
585
+ "single_word": false,
586
+ "special": true
587
+ },
588
+ "73": {
589
+ "content": "[unused11]",
590
+ "lstrip": false,
591
+ "normalized": false,
592
+ "rstrip": false,
593
+ "single_word": false,
594
+ "special": true
595
+ },
596
+ "74": {
597
+ "content": "[unused12]",
598
+ "lstrip": false,
599
+ "normalized": false,
600
+ "rstrip": false,
601
+ "single_word": false,
602
+ "special": true
603
+ },
604
+ "75": {
605
+ "content": "[unused13]",
606
+ "lstrip": false,
607
+ "normalized": false,
608
+ "rstrip": false,
609
+ "single_word": false,
610
+ "special": true
611
+ },
612
+ "76": {
613
+ "content": "[unused14]",
614
+ "lstrip": false,
615
+ "normalized": false,
616
+ "rstrip": false,
617
+ "single_word": false,
618
+ "special": true
619
+ },
620
+ "77": {
621
+ "content": "[unused15]",
622
+ "lstrip": false,
623
+ "normalized": false,
624
+ "rstrip": false,
625
+ "single_word": false,
626
+ "special": true
627
+ },
628
+ "78": {
629
+ "content": "[unused16]",
630
+ "lstrip": false,
631
+ "normalized": false,
632
+ "rstrip": false,
633
+ "single_word": false,
634
+ "special": true
635
+ },
636
+ "79": {
637
+ "content": "[unused17]",
638
+ "lstrip": false,
639
+ "normalized": false,
640
+ "rstrip": false,
641
+ "single_word": false,
642
+ "special": true
643
+ },
644
+ "80": {
645
+ "content": "[unused18]",
646
+ "lstrip": false,
647
+ "normalized": false,
648
+ "rstrip": false,
649
+ "single_word": false,
650
+ "special": true
651
+ },
652
+ "81": {
653
+ "content": "[unused19]",
654
+ "lstrip": false,
655
+ "normalized": false,
656
+ "rstrip": false,
657
+ "single_word": false,
658
+ "special": true
659
+ },
660
+ "82": {
661
+ "content": "[unused20]",
662
+ "lstrip": false,
663
+ "normalized": false,
664
+ "rstrip": false,
665
+ "single_word": false,
666
+ "special": true
667
+ },
668
+ "83": {
669
+ "content": "[unused21]",
670
+ "lstrip": false,
671
+ "normalized": false,
672
+ "rstrip": false,
673
+ "single_word": false,
674
+ "special": true
675
+ },
676
+ "84": {
677
+ "content": "[unused22]",
678
+ "lstrip": false,
679
+ "normalized": false,
680
+ "rstrip": false,
681
+ "single_word": false,
682
+ "special": true
683
+ },
684
+ "85": {
685
+ "content": "[unused23]",
686
+ "lstrip": false,
687
+ "normalized": false,
688
+ "rstrip": false,
689
+ "single_word": false,
690
+ "special": true
691
+ },
692
+ "86": {
693
+ "content": "[unused24]",
694
+ "lstrip": false,
695
+ "normalized": false,
696
+ "rstrip": false,
697
+ "single_word": false,
698
+ "special": true
699
+ },
700
+ "87": {
701
+ "content": "[unused25]",
702
+ "lstrip": false,
703
+ "normalized": false,
704
+ "rstrip": false,
705
+ "single_word": false,
706
+ "special": true
707
+ },
708
+ "88": {
709
+ "content": "[unused26]",
710
+ "lstrip": false,
711
+ "normalized": false,
712
+ "rstrip": false,
713
+ "single_word": false,
714
+ "special": true
715
+ },
716
+ "89": {
717
+ "content": "[unused27]",
718
+ "lstrip": false,
719
+ "normalized": false,
720
+ "rstrip": false,
721
+ "single_word": false,
722
+ "special": true
723
+ },
724
+ "90": {
725
+ "content": "[unused28]",
726
+ "lstrip": false,
727
+ "normalized": false,
728
+ "rstrip": false,
729
+ "single_word": false,
730
+ "special": true
731
+ },
732
+ "91": {
733
+ "content": "[unused29]",
734
+ "lstrip": false,
735
+ "normalized": false,
736
+ "rstrip": false,
737
+ "single_word": false,
738
+ "special": true
739
+ },
740
+ "92": {
741
+ "content": "[unused30]",
742
+ "lstrip": false,
743
+ "normalized": false,
744
+ "rstrip": false,
745
+ "single_word": false,
746
+ "special": true
747
+ },
748
+ "93": {
749
+ "content": "[unused31]",
750
+ "lstrip": false,
751
+ "normalized": false,
752
+ "rstrip": false,
753
+ "single_word": false,
754
+ "special": true
755
+ },
756
+ "94": {
757
+ "content": "[unused32]",
758
+ "lstrip": false,
759
+ "normalized": false,
760
+ "rstrip": false,
761
+ "single_word": false,
762
+ "special": true
763
+ },
764
+ "95": {
765
+ "content": "[unused33]",
766
+ "lstrip": false,
767
+ "normalized": false,
768
+ "rstrip": false,
769
+ "single_word": false,
770
+ "special": true
771
+ },
772
+ "96": {
773
+ "content": "[unused34]",
774
+ "lstrip": false,
775
+ "normalized": false,
776
+ "rstrip": false,
777
+ "single_word": false,
778
+ "special": true
779
+ },
780
+ "97": {
781
+ "content": "[unused35]",
782
+ "lstrip": false,
783
+ "normalized": false,
784
+ "rstrip": false,
785
+ "single_word": false,
786
+ "special": true
787
+ },
788
+ "98": {
789
+ "content": "[unused36]",
790
+ "lstrip": false,
791
+ "normalized": false,
792
+ "rstrip": false,
793
+ "single_word": false,
794
+ "special": true
795
+ },
796
+ "99": {
797
+ "content": "[unused37]",
798
+ "lstrip": false,
799
+ "normalized": false,
800
+ "rstrip": false,
801
+ "single_word": false,
802
+ "special": true
803
+ },
804
+ "100": {
805
+ "content": "[unused38]",
806
+ "lstrip": false,
807
+ "normalized": false,
808
+ "rstrip": false,
809
+ "single_word": false,
810
+ "special": true
811
+ },
812
+ "101": {
813
+ "content": "[unused39]",
814
+ "lstrip": false,
815
+ "normalized": false,
816
+ "rstrip": false,
817
+ "single_word": false,
818
+ "special": true
819
+ },
820
+ "102": {
821
+ "content": "[unused40]",
822
+ "lstrip": false,
823
+ "normalized": false,
824
+ "rstrip": false,
825
+ "single_word": false,
826
+ "special": true
827
+ },
828
+ "103": {
829
+ "content": "[unused41]",
830
+ "lstrip": false,
831
+ "normalized": false,
832
+ "rstrip": false,
833
+ "single_word": false,
834
+ "special": true
835
+ },
836
+ "104": {
837
+ "content": "[unused42]",
838
+ "lstrip": false,
839
+ "normalized": false,
840
+ "rstrip": false,
841
+ "single_word": false,
842
+ "special": true
843
+ },
844
+ "105": {
845
+ "content": "[unused43]",
846
+ "lstrip": false,
847
+ "normalized": false,
848
+ "rstrip": false,
849
+ "single_word": false,
850
+ "special": true
851
+ },
852
+ "106": {
853
+ "content": "[unused44]",
854
+ "lstrip": false,
855
+ "normalized": false,
856
+ "rstrip": false,
857
+ "single_word": false,
858
+ "special": true
859
+ },
860
+ "107": {
861
+ "content": "[unused45]",
862
+ "lstrip": false,
863
+ "normalized": false,
864
+ "rstrip": false,
865
+ "single_word": false,
866
+ "special": true
867
+ },
868
+ "108": {
869
+ "content": "[unused46]",
870
+ "lstrip": false,
871
+ "normalized": false,
872
+ "rstrip": false,
873
+ "single_word": false,
874
+ "special": true
875
+ },
876
+ "109": {
877
+ "content": "[unused47]",
878
+ "lstrip": false,
879
+ "normalized": false,
880
+ "rstrip": false,
881
+ "single_word": false,
882
+ "special": true
883
+ },
884
+ "110": {
885
+ "content": "[unused48]",
886
+ "lstrip": false,
887
+ "normalized": false,
888
+ "rstrip": false,
889
+ "single_word": false,
890
+ "special": true
891
+ },
892
+ "111": {
893
+ "content": "[unused49]",
894
+ "lstrip": false,
895
+ "normalized": false,
896
+ "rstrip": false,
897
+ "single_word": false,
898
+ "special": true
899
+ },
900
+ "112": {
901
+ "content": "[unused50]",
902
+ "lstrip": false,
903
+ "normalized": false,
904
+ "rstrip": false,
905
+ "single_word": false,
906
+ "special": true
907
+ },
908
+ "113": {
909
+ "content": "[unused51]",
910
+ "lstrip": false,
911
+ "normalized": false,
912
+ "rstrip": false,
913
+ "single_word": false,
914
+ "special": true
915
+ },
916
+ "114": {
917
+ "content": "[unused52]",
918
+ "lstrip": false,
919
+ "normalized": false,
920
+ "rstrip": false,
921
+ "single_word": false,
922
+ "special": true
923
+ },
924
+ "115": {
925
+ "content": "[unused53]",
926
+ "lstrip": false,
927
+ "normalized": false,
928
+ "rstrip": false,
929
+ "single_word": false,
930
+ "special": true
931
+ },
932
+ "116": {
933
+ "content": "[unused54]",
934
+ "lstrip": false,
935
+ "normalized": false,
936
+ "rstrip": false,
937
+ "single_word": false,
938
+ "special": true
939
+ },
940
+ "117": {
941
+ "content": "[unused55]",
942
+ "lstrip": false,
943
+ "normalized": false,
944
+ "rstrip": false,
945
+ "single_word": false,
946
+ "special": true
947
+ },
948
+ "118": {
949
+ "content": "[unused56]",
950
+ "lstrip": false,
951
+ "normalized": false,
952
+ "rstrip": false,
953
+ "single_word": false,
954
+ "special": true
955
+ },
956
+ "119": {
957
+ "content": "[unused57]",
958
+ "lstrip": false,
959
+ "normalized": false,
960
+ "rstrip": false,
961
+ "single_word": false,
962
+ "special": true
963
+ },
964
+ "120": {
965
+ "content": "[unused58]",
966
+ "lstrip": false,
967
+ "normalized": false,
968
+ "rstrip": false,
969
+ "single_word": false,
970
+ "special": true
971
+ },
972
+ "121": {
973
+ "content": "[unused59]",
974
+ "lstrip": false,
975
+ "normalized": false,
976
+ "rstrip": false,
977
+ "single_word": false,
978
+ "special": true
979
+ },
980
+ "122": {
981
+ "content": "[unused60]",
982
+ "lstrip": false,
983
+ "normalized": false,
984
+ "rstrip": false,
985
+ "single_word": false,
986
+ "special": true
987
+ },
988
+ "123": {
989
+ "content": "[unused61]",
990
+ "lstrip": false,
991
+ "normalized": false,
992
+ "rstrip": false,
993
+ "single_word": false,
994
+ "special": true
995
+ },
996
+ "124": {
997
+ "content": "[unused62]",
998
+ "lstrip": false,
999
+ "normalized": false,
1000
+ "rstrip": false,
1001
+ "single_word": false,
1002
+ "special": true
1003
+ },
1004
+ "125": {
1005
+ "content": "[unused63]",
1006
+ "lstrip": false,
1007
+ "normalized": false,
1008
+ "rstrip": false,
1009
+ "single_word": false,
1010
+ "special": true
1011
+ },
1012
+ "126": {
1013
+ "content": "[unused64]",
1014
+ "lstrip": false,
1015
+ "normalized": false,
1016
+ "rstrip": false,
1017
+ "single_word": false,
1018
+ "special": true
1019
+ },
1020
+ "127": {
1021
+ "content": "[unused65]",
1022
+ "lstrip": false,
1023
+ "normalized": false,
1024
+ "rstrip": false,
1025
+ "single_word": false,
1026
+ "special": true
1027
+ },
1028
+ "128": {
1029
+ "content": "[unused66]",
1030
+ "lstrip": false,
1031
+ "normalized": false,
1032
+ "rstrip": false,
1033
+ "single_word": false,
1034
+ "special": true
1035
+ },
1036
+ "129": {
1037
+ "content": "[unused67]",
1038
+ "lstrip": false,
1039
+ "normalized": false,
1040
+ "rstrip": false,
1041
+ "single_word": false,
1042
+ "special": true
1043
+ },
1044
+ "130": {
1045
+ "content": "[unused68]",
1046
+ "lstrip": false,
1047
+ "normalized": false,
1048
+ "rstrip": false,
1049
+ "single_word": false,
1050
+ "special": true
1051
+ },
1052
+ "131": {
1053
+ "content": "[unused69]",
1054
+ "lstrip": false,
1055
+ "normalized": false,
1056
+ "rstrip": false,
1057
+ "single_word": false,
1058
+ "special": true
1059
+ },
1060
+ "132": {
1061
+ "content": "[unused70]",
1062
+ "lstrip": false,
1063
+ "normalized": false,
1064
+ "rstrip": false,
1065
+ "single_word": false,
1066
+ "special": true
1067
+ },
1068
+ "133": {
1069
+ "content": "[unused71]",
1070
+ "lstrip": false,
1071
+ "normalized": false,
1072
+ "rstrip": false,
1073
+ "single_word": false,
1074
+ "special": true
1075
+ },
1076
+ "134": {
1077
+ "content": "[unused72]",
1078
+ "lstrip": false,
1079
+ "normalized": false,
1080
+ "rstrip": false,
1081
+ "single_word": false,
1082
+ "special": true
1083
+ },
1084
+ "135": {
1085
+ "content": "[unused73]",
1086
+ "lstrip": false,
1087
+ "normalized": false,
1088
+ "rstrip": false,
1089
+ "single_word": false,
1090
+ "special": true
1091
+ },
1092
+ "136": {
1093
+ "content": "[unused74]",
1094
+ "lstrip": false,
1095
+ "normalized": false,
1096
+ "rstrip": false,
1097
+ "single_word": false,
1098
+ "special": true
1099
+ },
1100
+ "137": {
1101
+ "content": "[unused75]",
1102
+ "lstrip": false,
1103
+ "normalized": false,
1104
+ "rstrip": false,
1105
+ "single_word": false,
1106
+ "special": true
1107
+ },
1108
+ "138": {
1109
+ "content": "[unused76]",
1110
+ "lstrip": false,
1111
+ "normalized": false,
1112
+ "rstrip": false,
1113
+ "single_word": false,
1114
+ "special": true
1115
+ },
1116
+ "139": {
1117
+ "content": "[unused77]",
1118
+ "lstrip": false,
1119
+ "normalized": false,
1120
+ "rstrip": false,
1121
+ "single_word": false,
1122
+ "special": true
1123
+ },
1124
+ "140": {
1125
+ "content": "[unused78]",
1126
+ "lstrip": false,
1127
+ "normalized": false,
1128
+ "rstrip": false,
1129
+ "single_word": false,
1130
+ "special": true
1131
+ },
1132
+ "141": {
1133
+ "content": "[unused79]",
1134
+ "lstrip": false,
1135
+ "normalized": false,
1136
+ "rstrip": false,
1137
+ "single_word": false,
1138
+ "special": true
1139
+ },
1140
+ "142": {
1141
+ "content": "[unused80]",
1142
+ "lstrip": false,
1143
+ "normalized": false,
1144
+ "rstrip": false,
1145
+ "single_word": false,
1146
+ "special": true
1147
+ },
1148
+ "143": {
1149
+ "content": "[unused81]",
1150
+ "lstrip": false,
1151
+ "normalized": false,
1152
+ "rstrip": false,
1153
+ "single_word": false,
1154
+ "special": true
1155
+ },
1156
+ "144": {
1157
+ "content": "[unused82]",
1158
+ "lstrip": false,
1159
+ "normalized": false,
1160
+ "rstrip": false,
1161
+ "single_word": false,
1162
+ "special": true
1163
+ },
1164
+ "145": {
1165
+ "content": "[unused83]",
1166
+ "lstrip": false,
1167
+ "normalized": false,
1168
+ "rstrip": false,
1169
+ "single_word": false,
1170
+ "special": true
1171
+ },
1172
+ "146": {
1173
+ "content": "[unused84]",
1174
+ "lstrip": false,
1175
+ "normalized": false,
1176
+ "rstrip": false,
1177
+ "single_word": false,
1178
+ "special": true
1179
+ },
1180
+ "147": {
1181
+ "content": "[unused85]",
1182
+ "lstrip": false,
1183
+ "normalized": false,
1184
+ "rstrip": false,
1185
+ "single_word": false,
1186
+ "special": true
1187
+ },
1188
+ "148": {
1189
+ "content": "[unused86]",
1190
+ "lstrip": false,
1191
+ "normalized": false,
1192
+ "rstrip": false,
1193
+ "single_word": false,
1194
+ "special": true
1195
+ },
1196
+ "149": {
1197
+ "content": "[unused87]",
1198
+ "lstrip": false,
1199
+ "normalized": false,
1200
+ "rstrip": false,
1201
+ "single_word": false,
1202
+ "special": true
1203
+ },
1204
+ "150": {
1205
+ "content": "[unused88]",
1206
+ "lstrip": false,
1207
+ "normalized": false,
1208
+ "rstrip": false,
1209
+ "single_word": false,
1210
+ "special": true
1211
+ },
1212
+ "151": {
1213
+ "content": "[unused89]",
1214
+ "lstrip": false,
1215
+ "normalized": false,
1216
+ "rstrip": false,
1217
+ "single_word": false,
1218
+ "special": true
1219
+ },
1220
+ "152": {
1221
+ "content": "[unused90]",
1222
+ "lstrip": false,
1223
+ "normalized": false,
1224
+ "rstrip": false,
1225
+ "single_word": false,
1226
+ "special": true
1227
+ },
1228
+ "153": {
1229
+ "content": "[unused91]",
1230
+ "lstrip": false,
1231
+ "normalized": false,
1232
+ "rstrip": false,
1233
+ "single_word": false,
1234
+ "special": true
1235
+ },
1236
+ "154": {
1237
+ "content": "[unused92]",
1238
+ "lstrip": false,
1239
+ "normalized": false,
1240
+ "rstrip": false,
1241
+ "single_word": false,
1242
+ "special": true
1243
+ },
1244
+ "155": {
1245
+ "content": "[unused93]",
1246
+ "lstrip": false,
1247
+ "normalized": false,
1248
+ "rstrip": false,
1249
+ "single_word": false,
1250
+ "special": true
1251
+ },
1252
+ "156": {
1253
+ "content": "[unused94]",
1254
+ "lstrip": false,
1255
+ "normalized": false,
1256
+ "rstrip": false,
1257
+ "single_word": false,
1258
+ "special": true
1259
+ },
1260
+ "157": {
1261
+ "content": "[unused95]",
1262
+ "lstrip": false,
1263
+ "normalized": false,
1264
+ "rstrip": false,
1265
+ "single_word": false,
1266
+ "special": true
1267
+ },
1268
+ "158": {
1269
+ "content": "[unused96]",
1270
+ "lstrip": false,
1271
+ "normalized": false,
1272
+ "rstrip": false,
1273
+ "single_word": false,
1274
+ "special": true
1275
+ },
1276
+ "159": {
1277
+ "content": "[unused97]",
1278
+ "lstrip": false,
1279
+ "normalized": false,
1280
+ "rstrip": false,
1281
+ "single_word": false,
1282
+ "special": true
1283
+ },
1284
+ "160": {
1285
+ "content": "[unused98]",
1286
+ "lstrip": false,
1287
+ "normalized": false,
1288
+ "rstrip": false,
1289
+ "single_word": false,
1290
+ "special": true
1291
+ },
1292
+ "161": {
1293
+ "content": "[unused99]",
1294
+ "lstrip": false,
1295
+ "normalized": false,
1296
+ "rstrip": false,
1297
+ "single_word": false,
1298
+ "special": true
1299
+ },
1300
+ "162": {
1301
+ "content": "[extra_id_0]",
1302
+ "lstrip": false,
1303
+ "normalized": false,
1304
+ "rstrip": false,
1305
+ "single_word": false,
1306
+ "special": true
1307
+ },
1308
+ "163": {
1309
+ "content": "[extra_id_1]",
1310
+ "lstrip": false,
1311
+ "normalized": false,
1312
+ "rstrip": false,
1313
+ "single_word": false,
1314
+ "special": true
1315
+ },
1316
+ "164": {
1317
+ "content": "[extra_id_2]",
1318
+ "lstrip": false,
1319
+ "normalized": false,
1320
+ "rstrip": false,
1321
+ "single_word": false,
1322
+ "special": true
1323
+ },
1324
+ "165": {
1325
+ "content": "[extra_id_3]",
1326
+ "lstrip": false,
1327
+ "normalized": false,
1328
+ "rstrip": false,
1329
+ "single_word": false,
1330
+ "special": true
1331
+ },
1332
+ "166": {
1333
+ "content": "[extra_id_4]",
1334
+ "lstrip": false,
1335
+ "normalized": false,
1336
+ "rstrip": false,
1337
+ "single_word": false,
1338
+ "special": true
1339
+ },
1340
+ "167": {
1341
+ "content": "[extra_id_5]",
1342
+ "lstrip": false,
1343
+ "normalized": false,
1344
+ "rstrip": false,
1345
+ "single_word": false,
1346
+ "special": true
1347
+ },
1348
+ "168": {
1349
+ "content": "[extra_id_6]",
1350
+ "lstrip": false,
1351
+ "normalized": false,
1352
+ "rstrip": false,
1353
+ "single_word": false,
1354
+ "special": true
1355
+ },
1356
+ "169": {
1357
+ "content": "[extra_id_7]",
1358
+ "lstrip": false,
1359
+ "normalized": false,
1360
+ "rstrip": false,
1361
+ "single_word": false,
1362
+ "special": true
1363
+ },
1364
+ "170": {
1365
+ "content": "[extra_id_8]",
1366
+ "lstrip": false,
1367
+ "normalized": false,
1368
+ "rstrip": false,
1369
+ "single_word": false,
1370
+ "special": true
1371
+ },
1372
+ "171": {
1373
+ "content": "[extra_id_9]",
1374
+ "lstrip": false,
1375
+ "normalized": false,
1376
+ "rstrip": false,
1377
+ "single_word": false,
1378
+ "special": true
1379
+ },
1380
+ "172": {
1381
+ "content": "[extra_id_10]",
1382
+ "lstrip": false,
1383
+ "normalized": false,
1384
+ "rstrip": false,
1385
+ "single_word": false,
1386
+ "special": true
1387
+ },
1388
+ "173": {
1389
+ "content": "[extra_id_11]",
1390
+ "lstrip": false,
1391
+ "normalized": false,
1392
+ "rstrip": false,
1393
+ "single_word": false,
1394
+ "special": true
1395
+ },
1396
+ "174": {
1397
+ "content": "[extra_id_12]",
1398
+ "lstrip": false,
1399
+ "normalized": false,
1400
+ "rstrip": false,
1401
+ "single_word": false,
1402
+ "special": true
1403
+ },
1404
+ "175": {
1405
+ "content": "[extra_id_13]",
1406
+ "lstrip": false,
1407
+ "normalized": false,
1408
+ "rstrip": false,
1409
+ "single_word": false,
1410
+ "special": true
1411
+ },
1412
+ "176": {
1413
+ "content": "[extra_id_14]",
1414
+ "lstrip": false,
1415
+ "normalized": false,
1416
+ "rstrip": false,
1417
+ "single_word": false,
1418
+ "special": true
1419
+ },
1420
+ "177": {
1421
+ "content": "[extra_id_15]",
1422
+ "lstrip": false,
1423
+ "normalized": false,
1424
+ "rstrip": false,
1425
+ "single_word": false,
1426
+ "special": true
1427
+ },
1428
+ "178": {
1429
+ "content": "[extra_id_16]",
1430
+ "lstrip": false,
1431
+ "normalized": false,
1432
+ "rstrip": false,
1433
+ "single_word": false,
1434
+ "special": true
1435
+ },
1436
+ "179": {
1437
+ "content": "[extra_id_17]",
1438
+ "lstrip": false,
1439
+ "normalized": false,
1440
+ "rstrip": false,
1441
+ "single_word": false,
1442
+ "special": true
1443
+ },
1444
+ "180": {
1445
+ "content": "[extra_id_18]",
1446
+ "lstrip": false,
1447
+ "normalized": false,
1448
+ "rstrip": false,
1449
+ "single_word": false,
1450
+ "special": true
1451
+ },
1452
+ "181": {
1453
+ "content": "[extra_id_19]",
1454
+ "lstrip": false,
1455
+ "normalized": false,
1456
+ "rstrip": false,
1457
+ "single_word": false,
1458
+ "special": true
1459
+ },
1460
+ "182": {
1461
+ "content": "[extra_id_20]",
1462
+ "lstrip": false,
1463
+ "normalized": false,
1464
+ "rstrip": false,
1465
+ "single_word": false,
1466
+ "special": true
1467
+ },
1468
+ "183": {
1469
+ "content": "[extra_id_21]",
1470
+ "lstrip": false,
1471
+ "normalized": false,
1472
+ "rstrip": false,
1473
+ "single_word": false,
1474
+ "special": true
1475
+ },
1476
+ "184": {
1477
+ "content": "[extra_id_22]",
1478
+ "lstrip": false,
1479
+ "normalized": false,
1480
+ "rstrip": false,
1481
+ "single_word": false,
1482
+ "special": true
1483
+ },
1484
+ "185": {
1485
+ "content": "[extra_id_23]",
1486
+ "lstrip": false,
1487
+ "normalized": false,
1488
+ "rstrip": false,
1489
+ "single_word": false,
1490
+ "special": true
1491
+ },
1492
+ "186": {
1493
+ "content": "[extra_id_24]",
1494
+ "lstrip": false,
1495
+ "normalized": false,
1496
+ "rstrip": false,
1497
+ "single_word": false,
1498
+ "special": true
1499
+ },
1500
+ "187": {
1501
+ "content": "[extra_id_25]",
1502
+ "lstrip": false,
1503
+ "normalized": false,
1504
+ "rstrip": false,
1505
+ "single_word": false,
1506
+ "special": true
1507
+ },
1508
+ "188": {
1509
+ "content": "[extra_id_26]",
1510
+ "lstrip": false,
1511
+ "normalized": false,
1512
+ "rstrip": false,
1513
+ "single_word": false,
1514
+ "special": true
1515
+ },
1516
+ "189": {
1517
+ "content": "[extra_id_27]",
1518
+ "lstrip": false,
1519
+ "normalized": false,
1520
+ "rstrip": false,
1521
+ "single_word": false,
1522
+ "special": true
1523
+ },
1524
+ "190": {
1525
+ "content": "[extra_id_28]",
1526
+ "lstrip": false,
1527
+ "normalized": false,
1528
+ "rstrip": false,
1529
+ "single_word": false,
1530
+ "special": true
1531
+ },
1532
+ "191": {
1533
+ "content": "[extra_id_29]",
1534
+ "lstrip": false,
1535
+ "normalized": false,
1536
+ "rstrip": false,
1537
+ "single_word": false,
1538
+ "special": true
1539
+ },
1540
+ "192": {
1541
+ "content": "[extra_id_30]",
1542
+ "lstrip": false,
1543
+ "normalized": false,
1544
+ "rstrip": false,
1545
+ "single_word": false,
1546
+ "special": true
1547
+ },
1548
+ "193": {
1549
+ "content": "[extra_id_31]",
1550
+ "lstrip": false,
1551
+ "normalized": false,
1552
+ "rstrip": false,
1553
+ "single_word": false,
1554
+ "special": true
1555
+ },
1556
+ "194": {
1557
+ "content": "[extra_id_32]",
1558
+ "lstrip": false,
1559
+ "normalized": false,
1560
+ "rstrip": false,
1561
+ "single_word": false,
1562
+ "special": true
1563
+ },
1564
+ "195": {
1565
+ "content": "[extra_id_33]",
1566
+ "lstrip": false,
1567
+ "normalized": false,
1568
+ "rstrip": false,
1569
+ "single_word": false,
1570
+ "special": true
1571
+ },
1572
+ "196": {
1573
+ "content": "[extra_id_34]",
1574
+ "lstrip": false,
1575
+ "normalized": false,
1576
+ "rstrip": false,
1577
+ "single_word": false,
1578
+ "special": true
1579
+ },
1580
+ "197": {
1581
+ "content": "[extra_id_35]",
1582
+ "lstrip": false,
1583
+ "normalized": false,
1584
+ "rstrip": false,
1585
+ "single_word": false,
1586
+ "special": true
1587
+ },
1588
+ "198": {
1589
+ "content": "[extra_id_36]",
1590
+ "lstrip": false,
1591
+ "normalized": false,
1592
+ "rstrip": false,
1593
+ "single_word": false,
1594
+ "special": true
1595
+ },
1596
+ "199": {
1597
+ "content": "[extra_id_37]",
1598
+ "lstrip": false,
1599
+ "normalized": false,
1600
+ "rstrip": false,
1601
+ "single_word": false,
1602
+ "special": true
1603
+ },
1604
+ "200": {
1605
+ "content": "[extra_id_38]",
1606
+ "lstrip": false,
1607
+ "normalized": false,
1608
+ "rstrip": false,
1609
+ "single_word": false,
1610
+ "special": true
1611
+ },
1612
+ "201": {
1613
+ "content": "[extra_id_39]",
1614
+ "lstrip": false,
1615
+ "normalized": false,
1616
+ "rstrip": false,
1617
+ "single_word": false,
1618
+ "special": true
1619
+ },
1620
+ "202": {
1621
+ "content": "[extra_id_40]",
1622
+ "lstrip": false,
1623
+ "normalized": false,
1624
+ "rstrip": false,
1625
+ "single_word": false,
1626
+ "special": true
1627
+ },
1628
+ "203": {
1629
+ "content": "[extra_id_41]",
1630
+ "lstrip": false,
1631
+ "normalized": false,
1632
+ "rstrip": false,
1633
+ "single_word": false,
1634
+ "special": true
1635
+ },
1636
+ "204": {
1637
+ "content": "[extra_id_42]",
1638
+ "lstrip": false,
1639
+ "normalized": false,
1640
+ "rstrip": false,
1641
+ "single_word": false,
1642
+ "special": true
1643
+ },
1644
+ "205": {
1645
+ "content": "[extra_id_43]",
1646
+ "lstrip": false,
1647
+ "normalized": false,
1648
+ "rstrip": false,
1649
+ "single_word": false,
1650
+ "special": true
1651
+ },
1652
+ "206": {
1653
+ "content": "[extra_id_44]",
1654
+ "lstrip": false,
1655
+ "normalized": false,
1656
+ "rstrip": false,
1657
+ "single_word": false,
1658
+ "special": true
1659
+ },
1660
+ "207": {
1661
+ "content": "[extra_id_45]",
1662
+ "lstrip": false,
1663
+ "normalized": false,
1664
+ "rstrip": false,
1665
+ "single_word": false,
1666
+ "special": true
1667
+ },
1668
+ "208": {
1669
+ "content": "[extra_id_46]",
1670
+ "lstrip": false,
1671
+ "normalized": false,
1672
+ "rstrip": false,
1673
+ "single_word": false,
1674
+ "special": true
1675
+ },
1676
+ "209": {
1677
+ "content": "[extra_id_47]",
1678
+ "lstrip": false,
1679
+ "normalized": false,
1680
+ "rstrip": false,
1681
+ "single_word": false,
1682
+ "special": true
1683
+ },
1684
+ "210": {
1685
+ "content": "[extra_id_48]",
1686
+ "lstrip": false,
1687
+ "normalized": false,
1688
+ "rstrip": false,
1689
+ "single_word": false,
1690
+ "special": true
1691
+ },
1692
+ "211": {
1693
+ "content": "[extra_id_49]",
1694
+ "lstrip": false,
1695
+ "normalized": false,
1696
+ "rstrip": false,
1697
+ "single_word": false,
1698
+ "special": true
1699
+ },
1700
+ "212": {
1701
+ "content": "[extra_id_50]",
1702
+ "lstrip": false,
1703
+ "normalized": false,
1704
+ "rstrip": false,
1705
+ "single_word": false,
1706
+ "special": true
1707
+ },
1708
+ "213": {
1709
+ "content": "[extra_id_51]",
1710
+ "lstrip": false,
1711
+ "normalized": false,
1712
+ "rstrip": false,
1713
+ "single_word": false,
1714
+ "special": true
1715
+ },
1716
+ "214": {
1717
+ "content": "[extra_id_52]",
1718
+ "lstrip": false,
1719
+ "normalized": false,
1720
+ "rstrip": false,
1721
+ "single_word": false,
1722
+ "special": true
1723
+ },
1724
+ "215": {
1725
+ "content": "[extra_id_53]",
1726
+ "lstrip": false,
1727
+ "normalized": false,
1728
+ "rstrip": false,
1729
+ "single_word": false,
1730
+ "special": true
1731
+ },
1732
+ "216": {
1733
+ "content": "[extra_id_54]",
1734
+ "lstrip": false,
1735
+ "normalized": false,
1736
+ "rstrip": false,
1737
+ "single_word": false,
1738
+ "special": true
1739
+ },
1740
+ "217": {
1741
+ "content": "[extra_id_55]",
1742
+ "lstrip": false,
1743
+ "normalized": false,
1744
+ "rstrip": false,
1745
+ "single_word": false,
1746
+ "special": true
1747
+ },
1748
+ "218": {
1749
+ "content": "[extra_id_56]",
1750
+ "lstrip": false,
1751
+ "normalized": false,
1752
+ "rstrip": false,
1753
+ "single_word": false,
1754
+ "special": true
1755
+ },
1756
+ "219": {
1757
+ "content": "[extra_id_57]",
1758
+ "lstrip": false,
1759
+ "normalized": false,
1760
+ "rstrip": false,
1761
+ "single_word": false,
1762
+ "special": true
1763
+ },
1764
+ "220": {
1765
+ "content": "[extra_id_58]",
1766
+ "lstrip": false,
1767
+ "normalized": false,
1768
+ "rstrip": false,
1769
+ "single_word": false,
1770
+ "special": true
1771
+ },
1772
+ "221": {
1773
+ "content": "[extra_id_59]",
1774
+ "lstrip": false,
1775
+ "normalized": false,
1776
+ "rstrip": false,
1777
+ "single_word": false,
1778
+ "special": true
1779
+ },
1780
+ "222": {
1781
+ "content": "[extra_id_60]",
1782
+ "lstrip": false,
1783
+ "normalized": false,
1784
+ "rstrip": false,
1785
+ "single_word": false,
1786
+ "special": true
1787
+ },
1788
+ "223": {
1789
+ "content": "[extra_id_61]",
1790
+ "lstrip": false,
1791
+ "normalized": false,
1792
+ "rstrip": false,
1793
+ "single_word": false,
1794
+ "special": true
1795
+ },
1796
+ "224": {
1797
+ "content": "[extra_id_62]",
1798
+ "lstrip": false,
1799
+ "normalized": false,
1800
+ "rstrip": false,
1801
+ "single_word": false,
1802
+ "special": true
1803
+ },
1804
+ "225": {
1805
+ "content": "[extra_id_63]",
1806
+ "lstrip": false,
1807
+ "normalized": false,
1808
+ "rstrip": false,
1809
+ "single_word": false,
1810
+ "special": true
1811
+ },
1812
+ "226": {
1813
+ "content": "[extra_id_64]",
1814
+ "lstrip": false,
1815
+ "normalized": false,
1816
+ "rstrip": false,
1817
+ "single_word": false,
1818
+ "special": true
1819
+ },
1820
+ "227": {
1821
+ "content": "[extra_id_65]",
1822
+ "lstrip": false,
1823
+ "normalized": false,
1824
+ "rstrip": false,
1825
+ "single_word": false,
1826
+ "special": true
1827
+ },
1828
+ "228": {
1829
+ "content": "[extra_id_66]",
1830
+ "lstrip": false,
1831
+ "normalized": false,
1832
+ "rstrip": false,
1833
+ "single_word": false,
1834
+ "special": true
1835
+ },
1836
+ "229": {
1837
+ "content": "[extra_id_67]",
1838
+ "lstrip": false,
1839
+ "normalized": false,
1840
+ "rstrip": false,
1841
+ "single_word": false,
1842
+ "special": true
1843
+ },
1844
+ "230": {
1845
+ "content": "[extra_id_68]",
1846
+ "lstrip": false,
1847
+ "normalized": false,
1848
+ "rstrip": false,
1849
+ "single_word": false,
1850
+ "special": true
1851
+ },
1852
+ "231": {
1853
+ "content": "[extra_id_69]",
1854
+ "lstrip": false,
1855
+ "normalized": false,
1856
+ "rstrip": false,
1857
+ "single_word": false,
1858
+ "special": true
1859
+ },
1860
+ "232": {
1861
+ "content": "[extra_id_70]",
1862
+ "lstrip": false,
1863
+ "normalized": false,
1864
+ "rstrip": false,
1865
+ "single_word": false,
1866
+ "special": true
1867
+ },
1868
+ "233": {
1869
+ "content": "[extra_id_71]",
1870
+ "lstrip": false,
1871
+ "normalized": false,
1872
+ "rstrip": false,
1873
+ "single_word": false,
1874
+ "special": true
1875
+ },
1876
+ "234": {
1877
+ "content": "[extra_id_72]",
1878
+ "lstrip": false,
1879
+ "normalized": false,
1880
+ "rstrip": false,
1881
+ "single_word": false,
1882
+ "special": true
1883
+ },
1884
+ "235": {
1885
+ "content": "[extra_id_73]",
1886
+ "lstrip": false,
1887
+ "normalized": false,
1888
+ "rstrip": false,
1889
+ "single_word": false,
1890
+ "special": true
1891
+ },
1892
+ "236": {
1893
+ "content": "[extra_id_74]",
1894
+ "lstrip": false,
1895
+ "normalized": false,
1896
+ "rstrip": false,
1897
+ "single_word": false,
1898
+ "special": true
1899
+ },
1900
+ "237": {
1901
+ "content": "[extra_id_75]",
1902
+ "lstrip": false,
1903
+ "normalized": false,
1904
+ "rstrip": false,
1905
+ "single_word": false,
1906
+ "special": true
1907
+ },
1908
+ "238": {
1909
+ "content": "[extra_id_76]",
1910
+ "lstrip": false,
1911
+ "normalized": false,
1912
+ "rstrip": false,
1913
+ "single_word": false,
1914
+ "special": true
1915
+ },
1916
+ "239": {
1917
+ "content": "[extra_id_77]",
1918
+ "lstrip": false,
1919
+ "normalized": false,
1920
+ "rstrip": false,
1921
+ "single_word": false,
1922
+ "special": true
1923
+ },
1924
+ "240": {
1925
+ "content": "[extra_id_78]",
1926
+ "lstrip": false,
1927
+ "normalized": false,
1928
+ "rstrip": false,
1929
+ "single_word": false,
1930
+ "special": true
1931
+ },
1932
+ "241": {
1933
+ "content": "[extra_id_79]",
1934
+ "lstrip": false,
1935
+ "normalized": false,
1936
+ "rstrip": false,
1937
+ "single_word": false,
1938
+ "special": true
1939
+ },
1940
+ "242": {
1941
+ "content": "[extra_id_80]",
1942
+ "lstrip": false,
1943
+ "normalized": false,
1944
+ "rstrip": false,
1945
+ "single_word": false,
1946
+ "special": true
1947
+ },
1948
+ "243": {
1949
+ "content": "[extra_id_81]",
1950
+ "lstrip": false,
1951
+ "normalized": false,
1952
+ "rstrip": false,
1953
+ "single_word": false,
1954
+ "special": true
1955
+ },
1956
+ "244": {
1957
+ "content": "[extra_id_82]",
1958
+ "lstrip": false,
1959
+ "normalized": false,
1960
+ "rstrip": false,
1961
+ "single_word": false,
1962
+ "special": true
1963
+ },
1964
+ "245": {
1965
+ "content": "[extra_id_83]",
1966
+ "lstrip": false,
1967
+ "normalized": false,
1968
+ "rstrip": false,
1969
+ "single_word": false,
1970
+ "special": true
1971
+ },
1972
+ "246": {
1973
+ "content": "[extra_id_84]",
1974
+ "lstrip": false,
1975
+ "normalized": false,
1976
+ "rstrip": false,
1977
+ "single_word": false,
1978
+ "special": true
1979
+ },
1980
+ "247": {
1981
+ "content": "[extra_id_85]",
1982
+ "lstrip": false,
1983
+ "normalized": false,
1984
+ "rstrip": false,
1985
+ "single_word": false,
1986
+ "special": true
1987
+ },
1988
+ "248": {
1989
+ "content": "[extra_id_86]",
1990
+ "lstrip": false,
1991
+ "normalized": false,
1992
+ "rstrip": false,
1993
+ "single_word": false,
1994
+ "special": true
1995
+ },
1996
+ "249": {
1997
+ "content": "[extra_id_87]",
1998
+ "lstrip": false,
1999
+ "normalized": false,
2000
+ "rstrip": false,
2001
+ "single_word": false,
2002
+ "special": true
2003
+ },
2004
+ "250": {
2005
+ "content": "[extra_id_88]",
2006
+ "lstrip": false,
2007
+ "normalized": false,
2008
+ "rstrip": false,
2009
+ "single_word": false,
2010
+ "special": true
2011
+ },
2012
+ "251": {
2013
+ "content": "[extra_id_89]",
2014
+ "lstrip": false,
2015
+ "normalized": false,
2016
+ "rstrip": false,
2017
+ "single_word": false,
2018
+ "special": true
2019
+ },
2020
+ "252": {
2021
+ "content": "[extra_id_90]",
2022
+ "lstrip": false,
2023
+ "normalized": false,
2024
+ "rstrip": false,
2025
+ "single_word": false,
2026
+ "special": true
2027
+ },
2028
+ "253": {
2029
+ "content": "[extra_id_91]",
2030
+ "lstrip": false,
2031
+ "normalized": false,
2032
+ "rstrip": false,
2033
+ "single_word": false,
2034
+ "special": true
2035
+ },
2036
+ "254": {
2037
+ "content": "[extra_id_92]",
2038
+ "lstrip": false,
2039
+ "normalized": false,
2040
+ "rstrip": false,
2041
+ "single_word": false,
2042
+ "special": true
2043
+ },
2044
+ "255": {
2045
+ "content": "[extra_id_93]",
2046
+ "lstrip": false,
2047
+ "normalized": false,
2048
+ "rstrip": false,
2049
+ "single_word": false,
2050
+ "special": true
2051
+ },
2052
+ "256": {
2053
+ "content": "[extra_id_94]",
2054
+ "lstrip": false,
2055
+ "normalized": false,
2056
+ "rstrip": false,
2057
+ "single_word": false,
2058
+ "special": true
2059
+ },
2060
+ "257": {
2061
+ "content": "[extra_id_95]",
2062
+ "lstrip": false,
2063
+ "normalized": false,
2064
+ "rstrip": false,
2065
+ "single_word": false,
2066
+ "special": true
2067
+ },
2068
+ "258": {
2069
+ "content": "[extra_id_96]",
2070
+ "lstrip": false,
2071
+ "normalized": false,
2072
+ "rstrip": false,
2073
+ "single_word": false,
2074
+ "special": true
2075
+ },
2076
+ "259": {
2077
+ "content": "[extra_id_97]",
2078
+ "lstrip": false,
2079
+ "normalized": false,
2080
+ "rstrip": false,
2081
+ "single_word": false,
2082
+ "special": true
2083
+ },
2084
+ "260": {
2085
+ "content": "[extra_id_98]",
2086
+ "lstrip": false,
2087
+ "normalized": false,
2088
+ "rstrip": false,
2089
+ "single_word": false,
2090
+ "special": true
2091
+ },
2092
+ "261": {
2093
+ "content": "[extra_id_99]",
2094
+ "lstrip": false,
2095
+ "normalized": false,
2096
+ "rstrip": false,
2097
+ "single_word": false,
2098
+ "special": true
2099
+ },
2100
+ "262": {
2101
+ "content": "[extra_id_100]",
2102
+ "lstrip": false,
2103
+ "normalized": false,
2104
+ "rstrip": false,
2105
+ "single_word": false,
2106
+ "special": true
2107
+ },
2108
+ "263": {
2109
+ "content": "[extra_id_101]",
2110
+ "lstrip": false,
2111
+ "normalized": false,
2112
+ "rstrip": false,
2113
+ "single_word": false,
2114
+ "special": true
2115
+ },
2116
+ "264": {
2117
+ "content": "[extra_id_102]",
2118
+ "lstrip": false,
2119
+ "normalized": false,
2120
+ "rstrip": false,
2121
+ "single_word": false,
2122
+ "special": true
2123
+ },
2124
+ "265": {
2125
+ "content": "[extra_id_103]",
2126
+ "lstrip": false,
2127
+ "normalized": false,
2128
+ "rstrip": false,
2129
+ "single_word": false,
2130
+ "special": true
2131
+ },
2132
+ "266": {
2133
+ "content": "[extra_id_104]",
2134
+ "lstrip": false,
2135
+ "normalized": false,
2136
+ "rstrip": false,
2137
+ "single_word": false,
2138
+ "special": true
2139
+ },
2140
+ "267": {
2141
+ "content": "[extra_id_105]",
2142
+ "lstrip": false,
2143
+ "normalized": false,
2144
+ "rstrip": false,
2145
+ "single_word": false,
2146
+ "special": true
2147
+ },
2148
+ "268": {
2149
+ "content": "[extra_id_106]",
2150
+ "lstrip": false,
2151
+ "normalized": false,
2152
+ "rstrip": false,
2153
+ "single_word": false,
2154
+ "special": true
2155
+ },
2156
+ "269": {
2157
+ "content": "[extra_id_107]",
2158
+ "lstrip": false,
2159
+ "normalized": false,
2160
+ "rstrip": false,
2161
+ "single_word": false,
2162
+ "special": true
2163
+ },
2164
+ "270": {
2165
+ "content": "[extra_id_108]",
2166
+ "lstrip": false,
2167
+ "normalized": false,
2168
+ "rstrip": false,
2169
+ "single_word": false,
2170
+ "special": true
2171
+ },
2172
+ "271": {
2173
+ "content": "[extra_id_109]",
2174
+ "lstrip": false,
2175
+ "normalized": false,
2176
+ "rstrip": false,
2177
+ "single_word": false,
2178
+ "special": true
2179
+ },
2180
+ "272": {
2181
+ "content": "[extra_id_110]",
2182
+ "lstrip": false,
2183
+ "normalized": false,
2184
+ "rstrip": false,
2185
+ "single_word": false,
2186
+ "special": true
2187
+ },
2188
+ "273": {
2189
+ "content": "[extra_id_111]",
2190
+ "lstrip": false,
2191
+ "normalized": false,
2192
+ "rstrip": false,
2193
+ "single_word": false,
2194
+ "special": true
2195
+ },
2196
+ "274": {
2197
+ "content": "[extra_id_112]",
2198
+ "lstrip": false,
2199
+ "normalized": false,
2200
+ "rstrip": false,
2201
+ "single_word": false,
2202
+ "special": true
2203
+ },
2204
+ "275": {
2205
+ "content": "[extra_id_113]",
2206
+ "lstrip": false,
2207
+ "normalized": false,
2208
+ "rstrip": false,
2209
+ "single_word": false,
2210
+ "special": true
2211
+ },
2212
+ "276": {
2213
+ "content": "[extra_id_114]",
2214
+ "lstrip": false,
2215
+ "normalized": false,
2216
+ "rstrip": false,
2217
+ "single_word": false,
2218
+ "special": true
2219
+ },
2220
+ "277": {
2221
+ "content": "[extra_id_115]",
2222
+ "lstrip": false,
2223
+ "normalized": false,
2224
+ "rstrip": false,
2225
+ "single_word": false,
2226
+ "special": true
2227
+ },
2228
+ "278": {
2229
+ "content": "[extra_id_116]",
2230
+ "lstrip": false,
2231
+ "normalized": false,
2232
+ "rstrip": false,
2233
+ "single_word": false,
2234
+ "special": true
2235
+ },
2236
+ "279": {
2237
+ "content": "[extra_id_117]",
2238
+ "lstrip": false,
2239
+ "normalized": false,
2240
+ "rstrip": false,
2241
+ "single_word": false,
2242
+ "special": true
2243
+ },
2244
+ "280": {
2245
+ "content": "[extra_id_118]",
2246
+ "lstrip": false,
2247
+ "normalized": false,
2248
+ "rstrip": false,
2249
+ "single_word": false,
2250
+ "special": true
2251
+ },
2252
+ "281": {
2253
+ "content": "[extra_id_119]",
2254
+ "lstrip": false,
2255
+ "normalized": false,
2256
+ "rstrip": false,
2257
+ "single_word": false,
2258
+ "special": true
2259
+ },
2260
+ "282": {
2261
+ "content": "[extra_id_120]",
2262
+ "lstrip": false,
2263
+ "normalized": false,
2264
+ "rstrip": false,
2265
+ "single_word": false,
2266
+ "special": true
2267
+ },
2268
+ "283": {
2269
+ "content": "[extra_id_121]",
2270
+ "lstrip": false,
2271
+ "normalized": false,
2272
+ "rstrip": false,
2273
+ "single_word": false,
2274
+ "special": true
2275
+ },
2276
+ "284": {
2277
+ "content": "[extra_id_122]",
2278
+ "lstrip": false,
2279
+ "normalized": false,
2280
+ "rstrip": false,
2281
+ "single_word": false,
2282
+ "special": true
2283
+ },
2284
+ "285": {
2285
+ "content": "[extra_id_123]",
2286
+ "lstrip": false,
2287
+ "normalized": false,
2288
+ "rstrip": false,
2289
+ "single_word": false,
2290
+ "special": true
2291
+ },
2292
+ "286": {
2293
+ "content": "[extra_id_124]",
2294
+ "lstrip": false,
2295
+ "normalized": false,
2296
+ "rstrip": false,
2297
+ "single_word": false,
2298
+ "special": true
2299
+ },
2300
+ "287": {
2301
+ "content": "[extra_id_125]",
2302
+ "lstrip": false,
2303
+ "normalized": false,
2304
+ "rstrip": false,
2305
+ "single_word": false,
2306
+ "special": true
2307
+ },
2308
+ "288": {
2309
+ "content": "[extra_id_126]",
2310
+ "lstrip": false,
2311
+ "normalized": false,
2312
+ "rstrip": false,
2313
+ "single_word": false,
2314
+ "special": true
2315
+ },
2316
+ "289": {
2317
+ "content": "[extra_id_127]",
2318
+ "lstrip": false,
2319
+ "normalized": false,
2320
+ "rstrip": false,
2321
+ "single_word": false,
2322
+ "special": true
2323
+ },
2324
+ "290": {
2325
+ "content": "[extra_id_128]",
2326
+ "lstrip": false,
2327
+ "normalized": false,
2328
+ "rstrip": false,
2329
+ "single_word": false,
2330
+ "special": true
2331
+ },
2332
+ "291": {
2333
+ "content": "[extra_id_129]",
2334
+ "lstrip": false,
2335
+ "normalized": false,
2336
+ "rstrip": false,
2337
+ "single_word": false,
2338
+ "special": true
2339
+ },
2340
+ "292": {
2341
+ "content": "[extra_id_130]",
2342
+ "lstrip": false,
2343
+ "normalized": false,
2344
+ "rstrip": false,
2345
+ "single_word": false,
2346
+ "special": true
2347
+ },
2348
+ "293": {
2349
+ "content": "[extra_id_131]",
2350
+ "lstrip": false,
2351
+ "normalized": false,
2352
+ "rstrip": false,
2353
+ "single_word": false,
2354
+ "special": true
2355
+ },
2356
+ "294": {
2357
+ "content": "[extra_id_132]",
2358
+ "lstrip": false,
2359
+ "normalized": false,
2360
+ "rstrip": false,
2361
+ "single_word": false,
2362
+ "special": true
2363
+ },
2364
+ "295": {
2365
+ "content": "[extra_id_133]",
2366
+ "lstrip": false,
2367
+ "normalized": false,
2368
+ "rstrip": false,
2369
+ "single_word": false,
2370
+ "special": true
2371
+ },
2372
+ "296": {
2373
+ "content": "[extra_id_134]",
2374
+ "lstrip": false,
2375
+ "normalized": false,
2376
+ "rstrip": false,
2377
+ "single_word": false,
2378
+ "special": true
2379
+ },
2380
+ "297": {
2381
+ "content": "[extra_id_135]",
2382
+ "lstrip": false,
2383
+ "normalized": false,
2384
+ "rstrip": false,
2385
+ "single_word": false,
2386
+ "special": true
2387
+ },
2388
+ "298": {
2389
+ "content": "[extra_id_136]",
2390
+ "lstrip": false,
2391
+ "normalized": false,
2392
+ "rstrip": false,
2393
+ "single_word": false,
2394
+ "special": true
2395
+ },
2396
+ "299": {
2397
+ "content": "[extra_id_137]",
2398
+ "lstrip": false,
2399
+ "normalized": false,
2400
+ "rstrip": false,
2401
+ "single_word": false,
2402
+ "special": true
2403
+ },
2404
+ "300": {
2405
+ "content": "[extra_id_138]",
2406
+ "lstrip": false,
2407
+ "normalized": false,
2408
+ "rstrip": false,
2409
+ "single_word": false,
2410
+ "special": true
2411
+ },
2412
+ "301": {
2413
+ "content": "[extra_id_139]",
2414
+ "lstrip": false,
2415
+ "normalized": false,
2416
+ "rstrip": false,
2417
+ "single_word": false,
2418
+ "special": true
2419
+ },
2420
+ "302": {
2421
+ "content": "[extra_id_140]",
2422
+ "lstrip": false,
2423
+ "normalized": false,
2424
+ "rstrip": false,
2425
+ "single_word": false,
2426
+ "special": true
2427
+ },
2428
+ "303": {
2429
+ "content": "[extra_id_141]",
2430
+ "lstrip": false,
2431
+ "normalized": false,
2432
+ "rstrip": false,
2433
+ "single_word": false,
2434
+ "special": true
2435
+ },
2436
+ "304": {
2437
+ "content": "[extra_id_142]",
2438
+ "lstrip": false,
2439
+ "normalized": false,
2440
+ "rstrip": false,
2441
+ "single_word": false,
2442
+ "special": true
2443
+ },
2444
+ "305": {
2445
+ "content": "[extra_id_143]",
2446
+ "lstrip": false,
2447
+ "normalized": false,
2448
+ "rstrip": false,
2449
+ "single_word": false,
2450
+ "special": true
2451
+ },
2452
+ "306": {
2453
+ "content": "[extra_id_144]",
2454
+ "lstrip": false,
2455
+ "normalized": false,
2456
+ "rstrip": false,
2457
+ "single_word": false,
2458
+ "special": true
2459
+ },
2460
+ "307": {
2461
+ "content": "[extra_id_145]",
2462
+ "lstrip": false,
2463
+ "normalized": false,
2464
+ "rstrip": false,
2465
+ "single_word": false,
2466
+ "special": true
2467
+ },
2468
+ "308": {
2469
+ "content": "[extra_id_146]",
2470
+ "lstrip": false,
2471
+ "normalized": false,
2472
+ "rstrip": false,
2473
+ "single_word": false,
2474
+ "special": true
2475
+ },
2476
+ "309": {
2477
+ "content": "[extra_id_147]",
2478
+ "lstrip": false,
2479
+ "normalized": false,
2480
+ "rstrip": false,
2481
+ "single_word": false,
2482
+ "special": true
2483
+ },
2484
+ "310": {
2485
+ "content": "[extra_id_148]",
2486
+ "lstrip": false,
2487
+ "normalized": false,
2488
+ "rstrip": false,
2489
+ "single_word": false,
2490
+ "special": true
2491
+ },
2492
+ "311": {
2493
+ "content": "[extra_id_149]",
2494
+ "lstrip": false,
2495
+ "normalized": false,
2496
+ "rstrip": false,
2497
+ "single_word": false,
2498
+ "special": true
2499
+ },
2500
+ "312": {
2501
+ "content": "[extra_id_150]",
2502
+ "lstrip": false,
2503
+ "normalized": false,
2504
+ "rstrip": false,
2505
+ "single_word": false,
2506
+ "special": true
2507
+ },
2508
+ "313": {
2509
+ "content": "[extra_id_151]",
2510
+ "lstrip": false,
2511
+ "normalized": false,
2512
+ "rstrip": false,
2513
+ "single_word": false,
2514
+ "special": true
2515
+ },
2516
+ "314": {
2517
+ "content": "[extra_id_152]",
2518
+ "lstrip": false,
2519
+ "normalized": false,
2520
+ "rstrip": false,
2521
+ "single_word": false,
2522
+ "special": true
2523
+ },
2524
+ "315": {
2525
+ "content": "[extra_id_153]",
2526
+ "lstrip": false,
2527
+ "normalized": false,
2528
+ "rstrip": false,
2529
+ "single_word": false,
2530
+ "special": true
2531
+ },
2532
+ "316": {
2533
+ "content": "[extra_id_154]",
2534
+ "lstrip": false,
2535
+ "normalized": false,
2536
+ "rstrip": false,
2537
+ "single_word": false,
2538
+ "special": true
2539
+ },
2540
+ "317": {
2541
+ "content": "[extra_id_155]",
2542
+ "lstrip": false,
2543
+ "normalized": false,
2544
+ "rstrip": false,
2545
+ "single_word": false,
2546
+ "special": true
2547
+ },
2548
+ "318": {
2549
+ "content": "[extra_id_156]",
2550
+ "lstrip": false,
2551
+ "normalized": false,
2552
+ "rstrip": false,
2553
+ "single_word": false,
2554
+ "special": true
2555
+ },
2556
+ "319": {
2557
+ "content": "[extra_id_157]",
2558
+ "lstrip": false,
2559
+ "normalized": false,
2560
+ "rstrip": false,
2561
+ "single_word": false,
2562
+ "special": true
2563
+ },
2564
+ "320": {
2565
+ "content": "[extra_id_158]",
2566
+ "lstrip": false,
2567
+ "normalized": false,
2568
+ "rstrip": false,
2569
+ "single_word": false,
2570
+ "special": true
2571
+ },
2572
+ "321": {
2573
+ "content": "[extra_id_159]",
2574
+ "lstrip": false,
2575
+ "normalized": false,
2576
+ "rstrip": false,
2577
+ "single_word": false,
2578
+ "special": true
2579
+ },
2580
+ "322": {
2581
+ "content": "[extra_id_160]",
2582
+ "lstrip": false,
2583
+ "normalized": false,
2584
+ "rstrip": false,
2585
+ "single_word": false,
2586
+ "special": true
2587
+ },
2588
+ "323": {
2589
+ "content": "[extra_id_161]",
2590
+ "lstrip": false,
2591
+ "normalized": false,
2592
+ "rstrip": false,
2593
+ "single_word": false,
2594
+ "special": true
2595
+ },
2596
+ "324": {
2597
+ "content": "[extra_id_162]",
2598
+ "lstrip": false,
2599
+ "normalized": false,
2600
+ "rstrip": false,
2601
+ "single_word": false,
2602
+ "special": true
2603
+ },
2604
+ "325": {
2605
+ "content": "[extra_id_163]",
2606
+ "lstrip": false,
2607
+ "normalized": false,
2608
+ "rstrip": false,
2609
+ "single_word": false,
2610
+ "special": true
2611
+ },
2612
+ "326": {
2613
+ "content": "[extra_id_164]",
2614
+ "lstrip": false,
2615
+ "normalized": false,
2616
+ "rstrip": false,
2617
+ "single_word": false,
2618
+ "special": true
2619
+ },
2620
+ "327": {
2621
+ "content": "[extra_id_165]",
2622
+ "lstrip": false,
2623
+ "normalized": false,
2624
+ "rstrip": false,
2625
+ "single_word": false,
2626
+ "special": true
2627
+ },
2628
+ "328": {
2629
+ "content": "[extra_id_166]",
2630
+ "lstrip": false,
2631
+ "normalized": false,
2632
+ "rstrip": false,
2633
+ "single_word": false,
2634
+ "special": true
2635
+ },
2636
+ "329": {
2637
+ "content": "[extra_id_167]",
2638
+ "lstrip": false,
2639
+ "normalized": false,
2640
+ "rstrip": false,
2641
+ "single_word": false,
2642
+ "special": true
2643
+ },
2644
+ "330": {
2645
+ "content": "[extra_id_168]",
2646
+ "lstrip": false,
2647
+ "normalized": false,
2648
+ "rstrip": false,
2649
+ "single_word": false,
2650
+ "special": true
2651
+ },
2652
+ "331": {
2653
+ "content": "[extra_id_169]",
2654
+ "lstrip": false,
2655
+ "normalized": false,
2656
+ "rstrip": false,
2657
+ "single_word": false,
2658
+ "special": true
2659
+ },
2660
+ "332": {
2661
+ "content": "[extra_id_170]",
2662
+ "lstrip": false,
2663
+ "normalized": false,
2664
+ "rstrip": false,
2665
+ "single_word": false,
2666
+ "special": true
2667
+ },
2668
+ "333": {
2669
+ "content": "[extra_id_171]",
2670
+ "lstrip": false,
2671
+ "normalized": false,
2672
+ "rstrip": false,
2673
+ "single_word": false,
2674
+ "special": true
2675
+ },
2676
+ "334": {
2677
+ "content": "[extra_id_172]",
2678
+ "lstrip": false,
2679
+ "normalized": false,
2680
+ "rstrip": false,
2681
+ "single_word": false,
2682
+ "special": true
2683
+ },
2684
+ "335": {
2685
+ "content": "[extra_id_173]",
2686
+ "lstrip": false,
2687
+ "normalized": false,
2688
+ "rstrip": false,
2689
+ "single_word": false,
2690
+ "special": true
2691
+ },
2692
+ "336": {
2693
+ "content": "[extra_id_174]",
2694
+ "lstrip": false,
2695
+ "normalized": false,
2696
+ "rstrip": false,
2697
+ "single_word": false,
2698
+ "special": true
2699
+ },
2700
+ "337": {
2701
+ "content": "[extra_id_175]",
2702
+ "lstrip": false,
2703
+ "normalized": false,
2704
+ "rstrip": false,
2705
+ "single_word": false,
2706
+ "special": true
2707
+ },
2708
+ "338": {
2709
+ "content": "[extra_id_176]",
2710
+ "lstrip": false,
2711
+ "normalized": false,
2712
+ "rstrip": false,
2713
+ "single_word": false,
2714
+ "special": true
2715
+ },
2716
+ "339": {
2717
+ "content": "[extra_id_177]",
2718
+ "lstrip": false,
2719
+ "normalized": false,
2720
+ "rstrip": false,
2721
+ "single_word": false,
2722
+ "special": true
2723
+ },
2724
+ "340": {
2725
+ "content": "[extra_id_178]",
2726
+ "lstrip": false,
2727
+ "normalized": false,
2728
+ "rstrip": false,
2729
+ "single_word": false,
2730
+ "special": true
2731
+ },
2732
+ "341": {
2733
+ "content": "[extra_id_179]",
2734
+ "lstrip": false,
2735
+ "normalized": false,
2736
+ "rstrip": false,
2737
+ "single_word": false,
2738
+ "special": true
2739
+ },
2740
+ "342": {
2741
+ "content": "[extra_id_180]",
2742
+ "lstrip": false,
2743
+ "normalized": false,
2744
+ "rstrip": false,
2745
+ "single_word": false,
2746
+ "special": true
2747
+ },
2748
+ "343": {
2749
+ "content": "[extra_id_181]",
2750
+ "lstrip": false,
2751
+ "normalized": false,
2752
+ "rstrip": false,
2753
+ "single_word": false,
2754
+ "special": true
2755
+ },
2756
+ "344": {
2757
+ "content": "[extra_id_182]",
2758
+ "lstrip": false,
2759
+ "normalized": false,
2760
+ "rstrip": false,
2761
+ "single_word": false,
2762
+ "special": true
2763
+ },
2764
+ "345": {
2765
+ "content": "[extra_id_183]",
2766
+ "lstrip": false,
2767
+ "normalized": false,
2768
+ "rstrip": false,
2769
+ "single_word": false,
2770
+ "special": true
2771
+ },
2772
+ "346": {
2773
+ "content": "[extra_id_184]",
2774
+ "lstrip": false,
2775
+ "normalized": false,
2776
+ "rstrip": false,
2777
+ "single_word": false,
2778
+ "special": true
2779
+ },
2780
+ "347": {
2781
+ "content": "[extra_id_185]",
2782
+ "lstrip": false,
2783
+ "normalized": false,
2784
+ "rstrip": false,
2785
+ "single_word": false,
2786
+ "special": true
2787
+ },
2788
+ "348": {
2789
+ "content": "[extra_id_186]",
2790
+ "lstrip": false,
2791
+ "normalized": false,
2792
+ "rstrip": false,
2793
+ "single_word": false,
2794
+ "special": true
2795
+ },
2796
+ "349": {
2797
+ "content": "[extra_id_187]",
2798
+ "lstrip": false,
2799
+ "normalized": false,
2800
+ "rstrip": false,
2801
+ "single_word": false,
2802
+ "special": true
2803
+ },
2804
+ "350": {
2805
+ "content": "[extra_id_188]",
2806
+ "lstrip": false,
2807
+ "normalized": false,
2808
+ "rstrip": false,
2809
+ "single_word": false,
2810
+ "special": true
2811
+ },
2812
+ "351": {
2813
+ "content": "[extra_id_189]",
2814
+ "lstrip": false,
2815
+ "normalized": false,
2816
+ "rstrip": false,
2817
+ "single_word": false,
2818
+ "special": true
2819
+ },
2820
+ "352": {
2821
+ "content": "[extra_id_190]",
2822
+ "lstrip": false,
2823
+ "normalized": false,
2824
+ "rstrip": false,
2825
+ "single_word": false,
2826
+ "special": true
2827
+ },
2828
+ "353": {
2829
+ "content": "[extra_id_191]",
2830
+ "lstrip": false,
2831
+ "normalized": false,
2832
+ "rstrip": false,
2833
+ "single_word": false,
2834
+ "special": true
2835
+ },
2836
+ "354": {
2837
+ "content": "[extra_id_192]",
2838
+ "lstrip": false,
2839
+ "normalized": false,
2840
+ "rstrip": false,
2841
+ "single_word": false,
2842
+ "special": true
2843
+ },
2844
+ "355": {
2845
+ "content": "[extra_id_193]",
2846
+ "lstrip": false,
2847
+ "normalized": false,
2848
+ "rstrip": false,
2849
+ "single_word": false,
2850
+ "special": true
2851
+ },
2852
+ "356": {
2853
+ "content": "[extra_id_194]",
2854
+ "lstrip": false,
2855
+ "normalized": false,
2856
+ "rstrip": false,
2857
+ "single_word": false,
2858
+ "special": true
2859
+ },
2860
+ "357": {
2861
+ "content": "[extra_id_195]",
2862
+ "lstrip": false,
2863
+ "normalized": false,
2864
+ "rstrip": false,
2865
+ "single_word": false,
2866
+ "special": true
2867
+ },
2868
+ "358": {
2869
+ "content": "[extra_id_196]",
2870
+ "lstrip": false,
2871
+ "normalized": false,
2872
+ "rstrip": false,
2873
+ "single_word": false,
2874
+ "special": true
2875
+ },
2876
+ "359": {
2877
+ "content": "[extra_id_197]",
2878
+ "lstrip": false,
2879
+ "normalized": false,
2880
+ "rstrip": false,
2881
+ "single_word": false,
2882
+ "special": true
2883
+ },
2884
+ "360": {
2885
+ "content": "[extra_id_198]",
2886
+ "lstrip": false,
2887
+ "normalized": false,
2888
+ "rstrip": false,
2889
+ "single_word": false,
2890
+ "special": true
2891
+ },
2892
+ "361": {
2893
+ "content": "[|endofturn|]",
2894
+ "lstrip": false,
2895
+ "normalized": false,
2896
+ "rstrip": false,
2897
+ "single_word": false,
2898
+ "special": true
2899
+ }
2900
+ },
2901
+ "additional_special_token": [
2902
+ "[unused0]",
2903
+ "[unused1]",
2904
+ "[unused2]",
2905
+ "[unused3]",
2906
+ "[unused4]",
2907
+ "[unused5]",
2908
+ "[unused6]",
2909
+ "[unused7]",
2910
+ "[unused8]",
2911
+ "[unused9]",
2912
+ "[unused10]",
2913
+ "[unused11]",
2914
+ "[unused12]",
2915
+ "[unused13]",
2916
+ "[unused14]",
2917
+ "[unused15]",
2918
+ "[unused16]",
2919
+ "[unused17]",
2920
+ "[unused18]",
2921
+ "[unused19]",
2922
+ "[unused20]",
2923
+ "[unused21]",
2924
+ "[unused22]",
2925
+ "[unused23]",
2926
+ "[unused24]",
2927
+ "[unused25]",
2928
+ "[unused26]",
2929
+ "[unused27]",
2930
+ "[unused28]",
2931
+ "[unused29]",
2932
+ "[unused30]",
2933
+ "[unused31]",
2934
+ "[unused32]",
2935
+ "[unused33]",
2936
+ "[unused34]",
2937
+ "[unused35]",
2938
+ "[unused36]",
2939
+ "[unused37]",
2940
+ "[unused38]",
2941
+ "[unused39]",
2942
+ "[unused40]",
2943
+ "[unused41]",
2944
+ "[unused42]",
2945
+ "[unused43]",
2946
+ "[unused44]",
2947
+ "[unused45]",
2948
+ "[unused46]",
2949
+ "[unused47]",
2950
+ "[unused48]",
2951
+ "[unused49]",
2952
+ "[unused50]",
2953
+ "[unused51]",
2954
+ "[unused52]",
2955
+ "[unused53]",
2956
+ "[unused54]",
2957
+ "[unused55]",
2958
+ "[unused56]",
2959
+ "[unused57]",
2960
+ "[unused58]",
2961
+ "[unused59]",
2962
+ "[unused60]",
2963
+ "[unused61]",
2964
+ "[unused62]",
2965
+ "[unused63]",
2966
+ "[unused64]",
2967
+ "[unused65]",
2968
+ "[unused66]",
2969
+ "[unused67]",
2970
+ "[unused68]",
2971
+ "[unused69]",
2972
+ "[unused70]",
2973
+ "[unused71]",
2974
+ "[unused72]",
2975
+ "[unused73]",
2976
+ "[unused74]",
2977
+ "[unused75]",
2978
+ "[unused76]",
2979
+ "[unused77]",
2980
+ "[unused78]",
2981
+ "[unused79]",
2982
+ "[unused80]",
2983
+ "[unused81]",
2984
+ "[unused82]",
2985
+ "[unused83]",
2986
+ "[unused84]",
2987
+ "[unused85]",
2988
+ "[unused86]",
2989
+ "[unused87]",
2990
+ "[unused88]",
2991
+ "[unused89]",
2992
+ "[unused90]",
2993
+ "[unused91]",
2994
+ "[unused92]",
2995
+ "[unused93]",
2996
+ "[unused94]",
2997
+ "[unused95]",
2998
+ "[unused96]",
2999
+ "[unused97]",
3000
+ "[unused98]",
3001
+ "[unused99]",
3002
+ "[extra_id_0]",
3003
+ "[extra_id_1]",
3004
+ "[extra_id_2]",
3005
+ "[extra_id_3]",
3006
+ "[extra_id_4]",
3007
+ "[extra_id_5]",
3008
+ "[extra_id_6]",
3009
+ "[extra_id_7]",
3010
+ "[extra_id_8]",
3011
+ "[extra_id_9]",
3012
+ "[extra_id_10]",
3013
+ "[extra_id_11]",
3014
+ "[extra_id_12]",
3015
+ "[extra_id_13]",
3016
+ "[extra_id_14]",
3017
+ "[extra_id_15]",
3018
+ "[extra_id_16]",
3019
+ "[extra_id_17]",
3020
+ "[extra_id_18]",
3021
+ "[extra_id_19]",
3022
+ "[extra_id_20]",
3023
+ "[extra_id_21]",
3024
+ "[extra_id_22]",
3025
+ "[extra_id_23]",
3026
+ "[extra_id_24]",
3027
+ "[extra_id_25]",
3028
+ "[extra_id_26]",
3029
+ "[extra_id_27]",
3030
+ "[extra_id_28]",
3031
+ "[extra_id_29]",
3032
+ "[extra_id_30]",
3033
+ "[extra_id_31]",
3034
+ "[extra_id_32]",
3035
+ "[extra_id_33]",
3036
+ "[extra_id_34]",
3037
+ "[extra_id_35]",
3038
+ "[extra_id_36]",
3039
+ "[extra_id_37]",
3040
+ "[extra_id_38]",
3041
+ "[extra_id_39]",
3042
+ "[extra_id_40]",
3043
+ "[extra_id_41]",
3044
+ "[extra_id_42]",
3045
+ "[extra_id_43]",
3046
+ "[extra_id_44]",
3047
+ "[extra_id_45]",
3048
+ "[extra_id_46]",
3049
+ "[extra_id_47]",
3050
+ "[extra_id_48]",
3051
+ "[extra_id_49]",
3052
+ "[extra_id_50]",
3053
+ "[extra_id_51]",
3054
+ "[extra_id_52]",
3055
+ "[extra_id_53]",
3056
+ "[extra_id_54]",
3057
+ "[extra_id_55]",
3058
+ "[extra_id_56]",
3059
+ "[extra_id_57]",
3060
+ "[extra_id_58]",
3061
+ "[extra_id_59]",
3062
+ "[extra_id_60]",
3063
+ "[extra_id_61]",
3064
+ "[extra_id_62]",
3065
+ "[extra_id_63]",
3066
+ "[extra_id_64]",
3067
+ "[extra_id_65]",
3068
+ "[extra_id_66]",
3069
+ "[extra_id_67]",
3070
+ "[extra_id_68]",
3071
+ "[extra_id_69]",
3072
+ "[extra_id_70]",
3073
+ "[extra_id_71]",
3074
+ "[extra_id_72]",
3075
+ "[extra_id_73]",
3076
+ "[extra_id_74]",
3077
+ "[extra_id_75]",
3078
+ "[extra_id_76]",
3079
+ "[extra_id_77]",
3080
+ "[extra_id_78]",
3081
+ "[extra_id_79]",
3082
+ "[extra_id_80]",
3083
+ "[extra_id_81]",
3084
+ "[extra_id_82]",
3085
+ "[extra_id_83]",
3086
+ "[extra_id_84]",
3087
+ "[extra_id_85]",
3088
+ "[extra_id_86]",
3089
+ "[extra_id_87]",
3090
+ "[extra_id_88]",
3091
+ "[extra_id_89]",
3092
+ "[extra_id_90]",
3093
+ "[extra_id_91]",
3094
+ "[extra_id_92]",
3095
+ "[extra_id_93]",
3096
+ "[extra_id_94]",
3097
+ "[extra_id_95]",
3098
+ "[extra_id_96]",
3099
+ "[extra_id_97]",
3100
+ "[extra_id_98]",
3101
+ "[extra_id_99]",
3102
+ "[extra_id_100]",
3103
+ "[extra_id_101]",
3104
+ "[extra_id_102]",
3105
+ "[extra_id_103]",
3106
+ "[extra_id_104]",
3107
+ "[extra_id_105]",
3108
+ "[extra_id_106]",
3109
+ "[extra_id_107]",
3110
+ "[extra_id_108]",
3111
+ "[extra_id_109]",
3112
+ "[extra_id_110]",
3113
+ "[extra_id_111]",
3114
+ "[extra_id_112]",
3115
+ "[extra_id_113]",
3116
+ "[extra_id_114]",
3117
+ "[extra_id_115]",
3118
+ "[extra_id_116]",
3119
+ "[extra_id_117]",
3120
+ "[extra_id_118]",
3121
+ "[extra_id_119]",
3122
+ "[extra_id_120]",
3123
+ "[extra_id_121]",
3124
+ "[extra_id_122]",
3125
+ "[extra_id_123]",
3126
+ "[extra_id_124]",
3127
+ "[extra_id_125]",
3128
+ "[extra_id_126]",
3129
+ "[extra_id_127]",
3130
+ "[extra_id_128]",
3131
+ "[extra_id_129]",
3132
+ "[extra_id_130]",
3133
+ "[extra_id_131]",
3134
+ "[extra_id_132]",
3135
+ "[extra_id_133]",
3136
+ "[extra_id_134]",
3137
+ "[extra_id_135]",
3138
+ "[extra_id_136]",
3139
+ "[extra_id_137]",
3140
+ "[extra_id_138]",
3141
+ "[extra_id_139]",
3142
+ "[extra_id_140]",
3143
+ "[extra_id_141]",
3144
+ "[extra_id_142]",
3145
+ "[extra_id_143]",
3146
+ "[extra_id_144]",
3147
+ "[extra_id_145]",
3148
+ "[extra_id_146]",
3149
+ "[extra_id_147]",
3150
+ "[extra_id_148]",
3151
+ "[extra_id_149]",
3152
+ "[extra_id_150]",
3153
+ "[extra_id_151]",
3154
+ "[extra_id_152]",
3155
+ "[extra_id_153]",
3156
+ "[extra_id_154]",
3157
+ "[extra_id_155]",
3158
+ "[extra_id_156]",
3159
+ "[extra_id_157]",
3160
+ "[extra_id_158]",
3161
+ "[extra_id_159]",
3162
+ "[extra_id_160]",
3163
+ "[extra_id_161]",
3164
+ "[extra_id_162]",
3165
+ "[extra_id_163]",
3166
+ "[extra_id_164]",
3167
+ "[extra_id_165]",
3168
+ "[extra_id_166]",
3169
+ "[extra_id_167]",
3170
+ "[extra_id_168]",
3171
+ "[extra_id_169]",
3172
+ "[extra_id_170]",
3173
+ "[extra_id_171]",
3174
+ "[extra_id_172]",
3175
+ "[extra_id_173]",
3176
+ "[extra_id_174]",
3177
+ "[extra_id_175]",
3178
+ "[extra_id_176]",
3179
+ "[extra_id_177]",
3180
+ "[extra_id_178]",
3181
+ "[extra_id_179]",
3182
+ "[extra_id_180]",
3183
+ "[extra_id_181]",
3184
+ "[extra_id_182]",
3185
+ "[extra_id_183]",
3186
+ "[extra_id_184]",
3187
+ "[extra_id_185]",
3188
+ "[extra_id_186]",
3189
+ "[extra_id_187]",
3190
+ "[extra_id_188]",
3191
+ "[extra_id_189]",
3192
+ "[extra_id_190]",
3193
+ "[extra_id_191]",
3194
+ "[extra_id_192]",
3195
+ "[extra_id_193]",
3196
+ "[extra_id_194]",
3197
+ "[extra_id_195]",
3198
+ "[extra_id_196]",
3199
+ "[extra_id_197]",
3200
+ "[extra_id_198]",
3201
+ "[|endofturn|]",
3202
+ "PI:URL",
3203
+ "PI:EMAIL",
3204
+ "PI:ACCOUNT_NUM",
3205
+ "PI:PHONE_NUM",
3206
+ "PI:BUSINESS_NUM",
3207
+ "PI:ANNON",
3208
+ "PI:KEY",
3209
+ "PI:ID",
3210
+ "PI:IP_ADDRESS",
3211
+ "PI:USER"
3212
+ ],
3213
+ "bos_token": "[BOS]",
3214
+ "chat_template": "{% for message in messages %}{% if loop.first and message['role'] != 'system' %}{{ '[|system|][|endofturn|]\n' }}{% endif %}{{ '[|' + message['role'] + '|]' + message['content'] }}{% if message['role'] == 'user' %}{{ '\n' }}{% else %}{{ '[|endofturn|]\n' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '[|assistant|]' }}{% endif %}",
3215
+ "clean_up_tokenization_spaces": true,
3216
+ "eos_token": "[|endofturn|]",
3217
+ "model_max_length": 1000000000000000019884624838656,
3218
+ "pad_token": "[PAD]",
3219
+ "tokenizer_class": "GPT2Tokenizer",
3220
+ "unk_token": "[UNK]"
3221
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff