czczup commited on
Commit
831cf85
·
verified ·
1 Parent(s): 6f97087

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ examples/red-panda.mp4 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,25 +1,19 @@
1
  ---
2
  license: mit
3
- datasets:
4
- - laion/laion2B-en
5
- - laion/laion-coco
6
- - laion/laion2B-multi
7
- - kakaobrain/coyo-700m
8
- - conceptual_captions
9
- - wanng/wukong100m
10
- pipeline_tag: visual-question-answering
11
  ---
12
 
13
- # Model Card for Mini-InternVL-Chat-4B-V1-5
14
 
15
- <center>
16
- <p><img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/pvfKc16O-ej91632FHaIK.png" style="width:80%;" alt="image/png"></p>
17
- </center>
18
 
19
- [\[🆕 Blog\]](https://internvl.github.io/blog/) [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/)
20
 
21
- [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[🚀 Quick Start\]](#model-usage) [\[🌐 Community-hosted API\]](https://rapidapi.com/adushar1320/api/internvl-chat) [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/675877376)
22
 
 
 
 
23
 
24
  You can run multimodal large models using a 1080Ti now.
25
 
@@ -45,35 +39,34 @@ As shown in the figure below, we adopted the same model architecture as InternVL
45
  - Learnable component in the finetuning stage: ViT + MLP + LLM
46
  - For more details on training hyperparameters, take a look at our code: [pretrain](<>) | [finetune](<>)
47
 
48
- ## Released Models
49
-
50
- | Model | Vision Foundation Model | Release Date | Note |
51
- | :----------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------: | :----------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
52
- | InternVL-Chat-V1-5(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5)) | InternViT-6B-448px-V1-5(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-5)) | 2024.04.18 | support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. (🔥new) |
53
- | InternVL-Chat-V1-2-Plus(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) ) | InternViT-6B-448px-V1-2(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) | 2024.02.21 | more SFT data and stronger |
54
- | InternVL-Chat-V1-2(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) ) | InternViT-6B-448px-V1-2(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-2)) | 2024.02.11 | scaling up LLM to 34B |
55
- | InternVL-Chat-V1-1(🤗 [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1)) | InternViT-6B-448px-V1-0(🤗 [HF link](https://huggingface.co/OpenGVLab/InternViT-6B-448px-V1-0)) | 2024.01.24 | support Chinese and stronger OCR |
56
-
57
  ## Performance
58
 
 
 
59
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/ngl8oZvNrjItWtLUQqB2V.png)
60
 
61
- ## Model Usage
62
 
63
- We provide an example code to run Mini-InternVL-Chat-4B-V1-5 using `transformers`.
 
 
64
 
65
- You can also use our [online demo](https://internvl.opengvlab.com/) for a quick experience of this model.
 
 
 
 
66
 
67
  > Please use transformers==4.37.2 to ensure the model works normally.
68
 
69
  ```python
70
- from transformers import AutoTokenizer, AutoModel
71
  import torch
72
  import torchvision.transforms as T
 
73
  from PIL import Image
74
-
75
  from torchvision.transforms.functional import InterpolationMode
76
-
77
 
78
  IMAGENET_MEAN = (0.485, 0.456, 0.406)
79
  IMAGENET_STD = (0.229, 0.224, 0.225)
@@ -153,7 +146,8 @@ def load_image(image_file, input_size=448, max_num=6):
153
  pixel_values = torch.stack(pixel_values)
154
  return pixel_values
155
 
156
- path = "OpenGVLab/Mini-InternVL-Chat-4B-V1-5"
 
157
  model = AutoModel.from_pretrained(
158
  path,
159
  torch_dtype=torch.bfloat16,
@@ -166,53 +160,153 @@ pixel_values = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16)
166
 
167
  generation_config = dict(
168
  num_beams=1,
169
- max_new_tokens=512,
170
  do_sample=False,
171
  )
172
 
173
- # single-round single-image conversation
174
- question = "请详细描述图片" # Please describe the picture in detail
 
 
 
 
 
 
 
 
 
 
 
175
  response = model.chat(tokenizer, pixel_values, question, generation_config)
176
- print(question, response)
 
177
 
178
- # multi-round single-image conversation
179
- question = "请详细描述图片" # Please describe the picture in detail
180
  response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
181
- print(question, response)
 
182
 
183
- question = "请根据图片写一首诗" # Please write a poem according to the picture
184
  response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
185
- print(question, response)
 
186
 
187
- # multi-round multi-image conversation
188
  pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
189
  pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
190
  pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
191
 
192
- question = "详细描述这两张图片" # Describe the two pictures in detail
193
- response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
194
- print(question, response)
195
 
196
- question = "这两张图片的相同点和区别分别是什么" # What are the similarities and differences between these two pictures
197
- response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
198
- print(question, response)
 
 
199
 
200
- # batch inference (single image per sample)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
  pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
202
  pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
203
- image_counts = [pixel_values1.size(0), pixel_values2.size(0)]
204
  pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
205
 
206
- questions = ["Describe the image in detail."] * len(image_counts)
207
  responses = model.batch_chat(tokenizer, pixel_values,
208
- image_counts=image_counts,
209
  questions=questions,
210
  generation_config=generation_config)
211
  for question, response in zip(questions, responses):
212
- print(question)
213
- print(response)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
  ```
215
 
 
 
 
 
 
 
 
 
 
 
216
  ## Citation
217
 
218
  If you find this project useful in your research, please consider citing:
@@ -231,11 +325,3 @@ If you find this project useful in your research, please consider citing:
231
  year={2024}
232
  }
233
  ```
234
-
235
- ## License
236
-
237
- This project is released under the MIT license.
238
-
239
- ## Acknowledgement
240
-
241
- InternVL is built with reference to the code of the following projects: [OpenAI CLIP](https://github.com/openai/CLIP), [Open CLIP](https://github.com/mlfoundations/open_clip), [CLIP Benchmark](https://github.com/LAION-AI/CLIP_benchmark), [EVA](https://github.com/baaivision/EVA/tree/master), [InternImage](https://github.com/OpenGVLab/InternImage), [ViT-Adapter](https://github.com/czczup/ViT-Adapter), [MMSegmentation](https://github.com/open-mmlab/mmsegmentation), [Transformers](https://github.com/huggingface/transformers), [DINOv2](https://github.com/facebookresearch/dinov2), [BLIP-2](https://github.com/salesforce/LAVIS/tree/main/projects/blip2), [Qwen-VL](https://github.com/QwenLM/Qwen-VL/tree/master/eval_mm), and [LLaVA-1.5](https://github.com/haotian-liu/LLaVA). Thanks for their awesome work!
 
1
  ---
2
  license: mit
3
+ pipeline_tag: image-text-to-text
 
 
 
 
 
 
 
4
  ---
5
 
6
+ # Mini-InternVL-Chat-4B-V1-5
7
 
8
+ [\[📂 GitHub\]](https://github.com/OpenGVLab/InternVL) [\[🆕 Blog\]](https://internvl.github.io/blog/) [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
 
 
9
 
10
+ [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/) [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[🚀 Quick Start\]](#quick-start) [\[📖 中文解读\]](https://zhuanlan.zhihu.com/p/675877376)
11
 
12
+ ## Introduction
13
 
14
+ <p align="center">
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/pvfKc16O-ej91632FHaIK.png" style="width:80%;" alt="image/png">
16
+ </p>
17
 
18
  You can run multimodal large models using a 1080Ti now.
19
 
 
39
  - Learnable component in the finetuning stage: ViT + MLP + LLM
40
  - For more details on training hyperparameters, take a look at our code: [pretrain](<>) | [finetune](<>)
41
 
 
 
 
 
 
 
 
 
 
42
  ## Performance
43
 
44
+ ### Image Benchmarks
45
+
46
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/ngl8oZvNrjItWtLUQqB2V.png)
47
 
48
+ - We simultaneously use InternVL and VLMEvalKit repositories for model evaluation. Specifically, the results reported for DocVQA, ChartQA, InfoVQA, TextVQA, MME, AI2D, MMBench, CCBench, MMVet, and SEED-Image were tested using the InternVL repository. MMMU, OCRBench, RealWorldQA, HallBench, and MathVista were evaluated using the VLMEvalKit.
49
 
50
+ - Please note that evaluating the same model using different testing toolkits like InternVL and VLMEvalKit can result in slight differences, which is normal. Updates to code versions and variations in environment and hardware can also cause minor discrepancies in results.
51
+
52
+ - It is important to mention that the MMVet scores we report are evaluated using GPT-4-0613 as the judge model. Different versions of GPT-4 can lead to significant variations in the scores for this dataset. For instance, using GPT-4-Turbo would result in significantly lower scores.
53
 
54
+ Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.
55
+
56
+ ## Quick Start
57
+
58
+ We provide an example code to run Mini-InternVL-Chat-4B-V1-5 using `transformers`.
59
 
60
  > Please use transformers==4.37.2 to ensure the model works normally.
61
 
62
  ```python
63
+ import numpy as np
64
  import torch
65
  import torchvision.transforms as T
66
+ from decord import VideoReader, cpu
67
  from PIL import Image
 
68
  from torchvision.transforms.functional import InterpolationMode
69
+ from transformers import AutoModel, AutoTokenizer
70
 
71
  IMAGENET_MEAN = (0.485, 0.456, 0.406)
72
  IMAGENET_STD = (0.229, 0.224, 0.225)
 
146
  pixel_values = torch.stack(pixel_values)
147
  return pixel_values
148
 
149
+
150
+ path = 'OpenGVLab/Mini-InternVL-Chat-4B-V1-5'
151
  model = AutoModel.from_pretrained(
152
  path,
153
  torch_dtype=torch.bfloat16,
 
160
 
161
  generation_config = dict(
162
  num_beams=1,
163
+ max_new_tokens=1024,
164
  do_sample=False,
165
  )
166
 
167
+ # pure-text conversation (纯文本对话)
168
+ question = 'Hello, who are you?'
169
+ response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
170
+ print(f'User: {question}')
171
+ print(f'Assistant: {response}')
172
+
173
+ question = 'Can you tell me a story?'
174
+ response, history = model.chat(tokenizer, None, question, generation_config, history=history, return_history=True)
175
+ print(f'User: {question}')
176
+ print(f'Assistant: {response}')
177
+
178
+ # single-image single-round conversation (单图单轮对话)
179
+ question = '<image>\nPlease describe the image shortly.'
180
  response = model.chat(tokenizer, pixel_values, question, generation_config)
181
+ print(f'User: {question}')
182
+ print(f'Assistant: {response}')
183
 
184
+ # single-image multi-round conversation (单图多轮对话)
185
+ question = '<image>\nPlease describe the image in detail.'
186
  response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=None, return_history=True)
187
+ print(f'User: {question}')
188
+ print(f'Assistant: {response}')
189
 
190
+ question = 'Please write a poem according to the image.'
191
  response, history = model.chat(tokenizer, pixel_values, question, generation_config, history=history, return_history=True)
192
+ print(f'User: {question}')
193
+ print(f'Assistant: {response}')
194
 
195
+ # multi-image multi-round conversation, combined images (多图多轮对话,拼接图像)
196
  pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
197
  pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
198
  pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
199
 
200
+ question = '<image>\nDescribe the two images in detail.'
201
+ response, history = model.chat(tokenizer, pixel_values, question, generation_config,
202
+ history=None, return_history=True)
203
 
204
+ question = 'What are the similarities and differences between these two images.'
205
+ response, history = model.chat(tokenizer, pixel_values, question, generation_config,
206
+ history=history, return_history=True)
207
+ print(f'User: {question}')
208
+ print(f'Assistant: {response}')
209
 
210
+ # multi-image multi-round conversation, separate images (多图多轮对话,独立图像)
211
+ pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
212
+ pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
213
+ pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
214
+ num_patches_list = [pixel_values1.size(0), pixel_values2.size(0)]
215
+
216
+ question = 'Image-1: <image>\nImage-2: <image>\nDescribe the two images in detail.'
217
+ response, history = model.chat(tokenizer, pixel_values, question, generation_config,
218
+ num_patches_list=num_patches_list,
219
+ history=None, return_history=True)
220
+ print(f'User: {question}')
221
+ print(f'Assistant: {response}')
222
+
223
+ question = 'What are the similarities and differences between these two images.'
224
+ response, history = model.chat(tokenizer, pixel_values, question, generation_config,
225
+ num_patches_list=num_patches_list,
226
+ history=history, return_history=True)
227
+ print(f'User: {question}')
228
+ print(f'Assistant: {response}')
229
+
230
+ # batch inference, single image per sample (单图批处理)
231
  pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
232
  pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda()
233
+ num_patches_list = [pixel_values1.size(0), pixel_values2.size(0)]
234
  pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0)
235
 
236
+ questions = ['<image>\nDescribe the image in detail.'] * len(num_patches_list)
237
  responses = model.batch_chat(tokenizer, pixel_values,
238
+ num_patches_list=num_patches_list,
239
  questions=questions,
240
  generation_config=generation_config)
241
  for question, response in zip(questions, responses):
242
+ print(f'User: {question}')
243
+ print(f'Assistant: {response}')
244
+
245
+ # video multi-round conversation (视频多轮对话)
246
+ def get_index(bound, fps, max_frame, first_idx=0, num_segments=32):
247
+ if bound:
248
+ start, end = bound[0], bound[1]
249
+ else:
250
+ start, end = -100000, 100000
251
+ start_idx = max(first_idx, round(start * fps))
252
+ end_idx = min(round(end * fps), max_frame)
253
+ seg_size = float(end_idx - start_idx) / num_segments
254
+ frame_indices = np.array([
255
+ int(start_idx + (seg_size / 2) + np.round(seg_size * idx))
256
+ for idx in range(num_segments)
257
+ ])
258
+ return frame_indices
259
+
260
+ def load_video(video_path, bound=None, input_size=448, max_num=1, num_segments=32):
261
+ vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
262
+ max_frame = len(vr) - 1
263
+ fps = float(vr.get_avg_fps())
264
+
265
+ pixel_values_list, num_patches_list = [], []
266
+ transform = build_transform(input_size=input_size)
267
+ frame_indices = get_index(bound, fps, max_frame, first_idx=0, num_segments=num_segments)
268
+ for frame_index in frame_indices:
269
+ img = Image.fromarray(vr[frame_index].asnumpy()).convert('RGB')
270
+ img = dynamic_preprocess(img, image_size=input_size, use_thumbnail=True, max_num=max_num)
271
+ pixel_values = [transform(tile) for tile in img]
272
+ pixel_values = torch.stack(pixel_values)
273
+ num_patches_list.append(pixel_values.shape[0])
274
+ pixel_values_list.append(pixel_values)
275
+ pixel_values = torch.cat(pixel_values_list)
276
+ return pixel_values, num_patches_list
277
+
278
+
279
+ video_path = './examples/red-panda.mp4'
280
+ # pixel_values, num_patches_list = load_video(video_path, num_segments=32, max_num=1)
281
+ pixel_values, num_patches_list = load_video(video_path, num_segments=8, max_num=1)
282
+ pixel_values = pixel_values.to(torch.bfloat16).cuda()
283
+ video_prefix = ''.join([f'Frame{i+1}: <image>\n' for i in range(len(num_patches_list))])
284
+ question = video_prefix + 'What is the red panda doing?'
285
+ # Frame1: <image>\nFrame2: <image>\n...\nFrame31: <image>\n{question}
286
+ response, history = model.chat(tokenizer, pixel_values, question, generation_config,
287
+ num_patches_list=num_patches_list,
288
+ history=None, return_history=True)
289
+ print(f'User: {question}')
290
+ print(f'Assistant: {response}')
291
+
292
+ question = 'Describe this video in detail. Don\'t repeat.'
293
+ response, history = model.chat(tokenizer, pixel_values, question, generation_config,
294
+ num_patches_list=num_patches_list,
295
+ history=history, return_history=True)
296
+ print(f'User: {question}')
297
+ print(f'Assistant: {response}')
298
  ```
299
 
300
+ ## Deployment
301
+
302
+ ### LMDeploy
303
+
304
+ > Warning: This model is not yet supported by LMDeploy.
305
+
306
+ ## License
307
+
308
+ This project is released under the MIT license.
309
+
310
  ## Citation
311
 
312
  If you find this project useful in your research, please consider citing:
 
325
  year={2024}
326
  }
327
  ```
 
 
 
 
 
 
 
 
config.json CHANGED
@@ -1,6 +1,5 @@
1
  {
2
  "_commit_hash": null,
3
- "_name_or_path": "OpenGVLab/Mini-InternVL-Chat-4B-V1-5",
4
  "architectures": [
5
  "InternVLChatModel"
6
  ],
@@ -13,15 +12,15 @@
13
  "dynamic_image_size": true,
14
  "force_image_size": 448,
15
  "llm_config": {
16
- "_name_or_path": "pretrained/Phi-3-mini-128k-instruct/",
17
  "add_cross_attention": false,
18
  "architectures": [
19
  "Phi3ForCausalLM"
20
  ],
 
21
  "attention_dropout": 0.0,
22
  "auto_map": {
23
  "AutoConfig": "configuration_phi3.Phi3Config",
24
- "AutoModel": "modeling_phi3.Phi3ForCausalLM",
25
  "AutoModelForCausalLM": "modeling_phi3.Phi3ForCausalLM"
26
  },
27
  "bad_words_ids": null,
@@ -194,112 +193,52 @@
194
  "tie_word_embeddings": false,
195
  "tokenizer_class": null,
196
  "top_k": 50,
197
- "top_p": 1.0,
198
  "torch_dtype": "bfloat16",
199
  "torchscript": false,
200
- "transformers_version": "4.36.2",
201
  "typical_p": 1.0,
202
- "use_bfloat16": false,
203
  "use_cache": true,
204
  "vocab_size": 32020
205
  },
206
  "max_dynamic_patch": 12,
207
  "min_dynamic_patch": 1,
208
  "model_type": "internvl_chat",
209
- "pad2square": false,
210
  "ps_version": "v2",
211
  "select_layer": -1,
212
  "template": "phi3-chat",
213
  "torch_dtype": "bfloat16",
214
- "transformers_version": null,
215
  "use_backbone_lora": 0,
216
  "use_llm_lora": 0,
217
  "use_thumbnail": true,
218
  "vision_config": {
219
- "_name_or_path": "OpenGVLab/InternViT-300M-448px",
220
- "add_cross_attention": false,
221
  "architectures": [
222
  "InternVisionModel"
223
  ],
224
  "attention_dropout": 0.0,
225
- "auto_map": {
226
- "AutoConfig": "configuration_intern_vit.InternVisionConfig",
227
- "AutoModel": "modeling_intern_vit.InternVisionModel"
228
- },
229
- "bad_words_ids": null,
230
- "begin_suppress_tokens": null,
231
- "bos_token_id": null,
232
- "chunk_size_feed_forward": 0,
233
- "cross_attention_hidden_size": null,
234
- "decoder_start_token_id": null,
235
- "diversity_penalty": 0.0,
236
- "do_sample": false,
237
- "drop_path_rate": 0.1,
238
  "dropout": 0.0,
239
- "early_stopping": false,
240
- "encoder_no_repeat_ngram_size": 0,
241
- "eos_token_id": null,
242
- "exponential_decay_length_penalty": null,
243
- "finetuning_task": null,
244
- "forced_bos_token_id": null,
245
- "forced_eos_token_id": null,
246
  "hidden_act": "gelu",
247
  "hidden_size": 1024,
248
- "id2label": {
249
- "0": "LABEL_0",
250
- "1": "LABEL_1"
251
- },
252
  "image_size": 448,
253
  "initializer_factor": 1.0,
254
  "initializer_range": 0.02,
255
  "intermediate_size": 4096,
256
- "is_decoder": false,
257
- "is_encoder_decoder": false,
258
- "label2id": {
259
- "LABEL_0": 0,
260
- "LABEL_1": 1
261
- },
262
  "layer_norm_eps": 1e-06,
263
- "length_penalty": 1.0,
264
- "max_length": 20,
265
- "min_length": 0,
266
  "model_type": "intern_vit_6b",
267
- "no_repeat_ngram_size": 0,
268
  "norm_type": "layer_norm",
269
  "num_attention_heads": 16,
270
- "num_beam_groups": 1,
271
- "num_beams": 1,
272
  "num_channels": 3,
273
  "num_hidden_layers": 24,
274
- "num_return_sequences": 1,
275
  "output_attentions": false,
276
  "output_hidden_states": false,
277
- "output_scores": false,
278
- "pad_token_id": null,
279
  "patch_size": 14,
280
- "prefix": null,
281
- "problem_type": null,
282
- "pruned_heads": {},
283
  "qk_normalization": false,
284
  "qkv_bias": true,
285
- "remove_invalid_values": false,
286
- "repetition_penalty": 1.0,
287
  "return_dict": true,
288
- "return_dict_in_generate": false,
289
- "sep_token_id": null,
290
- "suppress_tokens": null,
291
- "task_specific_params": null,
292
- "temperature": 1.0,
293
- "tf_legacy_loss": false,
294
- "tie_encoder_decoder": false,
295
- "tie_word_embeddings": true,
296
- "tokenizer_class": null,
297
- "top_k": 50,
298
- "top_p": 1.0,
299
  "torch_dtype": "bfloat16",
300
- "torchscript": false,
301
- "transformers_version": "4.36.2",
302
- "typical_p": 1.0,
303
  "use_bfloat16": true,
304
  "use_flash_attn": true
305
  }
 
1
  {
2
  "_commit_hash": null,
 
3
  "architectures": [
4
  "InternVLChatModel"
5
  ],
 
12
  "dynamic_image_size": true,
13
  "force_image_size": 448,
14
  "llm_config": {
15
+ "_name_or_path": "microsoft/Phi-3-mini-128k-instruct",
16
  "add_cross_attention": false,
17
  "architectures": [
18
  "Phi3ForCausalLM"
19
  ],
20
+ "attn_implementation": "flash_attention_2",
21
  "attention_dropout": 0.0,
22
  "auto_map": {
23
  "AutoConfig": "configuration_phi3.Phi3Config",
 
24
  "AutoModelForCausalLM": "modeling_phi3.Phi3ForCausalLM"
25
  },
26
  "bad_words_ids": null,
 
193
  "tie_word_embeddings": false,
194
  "tokenizer_class": null,
195
  "top_k": 50,
196
+ "top_p": null,
197
  "torch_dtype": "bfloat16",
198
  "torchscript": false,
199
+ "transformers_version": "4.37.2",
200
  "typical_p": 1.0,
201
+ "use_bfloat16": true,
202
  "use_cache": true,
203
  "vocab_size": 32020
204
  },
205
  "max_dynamic_patch": 12,
206
  "min_dynamic_patch": 1,
207
  "model_type": "internvl_chat",
 
208
  "ps_version": "v2",
209
  "select_layer": -1,
210
  "template": "phi3-chat",
211
  "torch_dtype": "bfloat16",
 
212
  "use_backbone_lora": 0,
213
  "use_llm_lora": 0,
214
  "use_thumbnail": true,
215
  "vision_config": {
 
 
216
  "architectures": [
217
  "InternVisionModel"
218
  ],
219
  "attention_dropout": 0.0,
220
+ "drop_path_rate": 0.0,
 
 
 
 
 
 
 
 
 
 
 
 
221
  "dropout": 0.0,
 
 
 
 
 
 
 
222
  "hidden_act": "gelu",
223
  "hidden_size": 1024,
 
 
 
 
224
  "image_size": 448,
225
  "initializer_factor": 1.0,
226
  "initializer_range": 0.02,
227
  "intermediate_size": 4096,
 
 
 
 
 
 
228
  "layer_norm_eps": 1e-06,
 
 
 
229
  "model_type": "intern_vit_6b",
 
230
  "norm_type": "layer_norm",
231
  "num_attention_heads": 16,
 
 
232
  "num_channels": 3,
233
  "num_hidden_layers": 24,
 
234
  "output_attentions": false,
235
  "output_hidden_states": false,
 
 
236
  "patch_size": 14,
 
 
 
237
  "qk_normalization": false,
238
  "qkv_bias": true,
 
 
239
  "return_dict": true,
 
 
 
 
 
 
 
 
 
 
 
240
  "torch_dtype": "bfloat16",
241
+ "transformers_version": "4.37.2",
 
 
242
  "use_bfloat16": true,
243
  "use_flash_attn": true
244
  }
configuration_internvl_chat.py CHANGED
@@ -26,7 +26,6 @@ class InternVLChatConfig(PretrainedConfig):
26
  llm_config=None,
27
  use_backbone_lora=0,
28
  use_llm_lora=0,
29
- pad2square=False,
30
  select_layer=-1,
31
  force_image_size=None,
32
  downsample_ratio=0.5,
@@ -56,7 +55,6 @@ class InternVLChatConfig(PretrainedConfig):
56
  raise ValueError('Unsupported architecture: {}'.format(llm_config['architectures'][0]))
57
  self.use_backbone_lora = use_backbone_lora
58
  self.use_llm_lora = use_llm_lora
59
- self.pad2square = pad2square
60
  self.select_layer = select_layer
61
  self.force_image_size = force_image_size
62
  self.downsample_ratio = downsample_ratio
@@ -85,7 +83,6 @@ class InternVLChatConfig(PretrainedConfig):
85
  output['model_type'] = self.__class__.model_type
86
  output['use_backbone_lora'] = self.use_backbone_lora
87
  output['use_llm_lora'] = self.use_llm_lora
88
- output['pad2square'] = self.pad2square
89
  output['select_layer'] = self.select_layer
90
  output['force_image_size'] = self.force_image_size
91
  output['downsample_ratio'] = self.downsample_ratio
 
26
  llm_config=None,
27
  use_backbone_lora=0,
28
  use_llm_lora=0,
 
29
  select_layer=-1,
30
  force_image_size=None,
31
  downsample_ratio=0.5,
 
55
  raise ValueError('Unsupported architecture: {}'.format(llm_config['architectures'][0]))
56
  self.use_backbone_lora = use_backbone_lora
57
  self.use_llm_lora = use_llm_lora
 
58
  self.select_layer = select_layer
59
  self.force_image_size = force_image_size
60
  self.downsample_ratio = downsample_ratio
 
83
  output['model_type'] = self.__class__.model_type
84
  output['use_backbone_lora'] = self.use_backbone_lora
85
  output['use_llm_lora'] = self.use_llm_lora
 
86
  output['select_layer'] = self.select_layer
87
  output['force_image_size'] = self.force_image_size
88
  output['downsample_ratio'] = self.downsample_ratio
conversation.py CHANGED
@@ -2,7 +2,7 @@
2
  Conversation prompt templates.
3
 
4
  We kindly request that you import fastchat instead of copying this file if you wish to use it.
5
- If you have any changes in mind, please contribute back so the community can benefit collectively and continue to maintain these valuable templates.
6
  """
7
 
8
  import dataclasses
@@ -330,384 +330,6 @@ def get_conv_template(name: str) -> Conversation:
330
  return conv_templates[name].copy()
331
 
332
 
333
- # An empty template for raw conversation.
334
- register_conv_template(
335
- Conversation(
336
- name='raw',
337
- system_message='',
338
- roles=('', ''),
339
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
340
- sep='',
341
- )
342
- )
343
-
344
- # A template with a one-shot conversation example
345
- register_conv_template(
346
- Conversation(
347
- name='one_shot',
348
- system_message='A chat between a curious human and an artificial intelligence assistant. '
349
- "The assistant gives helpful, detailed, and polite answers to the human's questions.",
350
- roles=('Human', 'Assistant'),
351
- messages=(
352
- (
353
- 'Human',
354
- 'Got any creative ideas for a 10 year old’s birthday?',
355
- ),
356
- (
357
- 'Assistant',
358
- """Of course! Here are some creative ideas for a 10-year-old's birthday party:
359
- 1. Treasure Hunt: Organize a treasure hunt in your backyard or nearby park. Create clues and riddles for the kids to solve, leading them to hidden treasures and surprises.
360
- 2. Science Party: Plan a science-themed party where kids can engage in fun and interactive experiments. You can set up different stations with activities like making slime, erupting volcanoes, or creating simple chemical reactions.
361
- 3. Outdoor Movie Night: Set up a backyard movie night with a projector and a large screen or white sheet. Create a cozy seating area with blankets and pillows, and serve popcorn and snacks while the kids enjoy a favorite movie under the stars.
362
- 4. DIY Crafts Party: Arrange a craft party where kids can unleash their creativity. Provide a variety of craft supplies like beads, paints, and fabrics, and let them create their own unique masterpieces to take home as party favors.
363
- 5. Sports Olympics: Host a mini Olympics event with various sports and games. Set up different stations for activities like sack races, relay races, basketball shooting, and obstacle courses. Give out medals or certificates to the participants.
364
- 6. Cooking Party: Have a cooking-themed party where the kids can prepare their own mini pizzas, cupcakes, or cookies. Provide toppings, frosting, and decorating supplies, and let them get hands-on in the kitchen.
365
- 7. Superhero Training Camp: Create a superhero-themed party where the kids can engage in fun training activities. Set up an obstacle course, have them design their own superhero capes or masks, and organize superhero-themed games and challenges.
366
- 8. Outdoor Adventure: Plan an outdoor adventure party at a local park or nature reserve. Arrange activities like hiking, nature scavenger hunts, or a picnic with games. Encourage exploration and appreciation for the outdoors.
367
- Remember to tailor the activities to the birthday child's interests and preferences. Have a great celebration!""",
368
- ),
369
- ),
370
- offset=2,
371
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
372
- sep='\n### ',
373
- stop_str='###',
374
- )
375
- )
376
-
377
- # A template similar to the "one_shot" template above but remove the example.
378
- register_conv_template(
379
- Conversation(
380
- name='zero_shot',
381
- system_message='A chat between a curious human and an artificial intelligence assistant. '
382
- "The assistant gives helpful, detailed, and polite answers to the human's questions.",
383
- roles=('Human', 'Assistant'),
384
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
385
- sep='\n### ',
386
- stop_str='###',
387
- )
388
- )
389
-
390
- # Vicuna v1.1 template
391
- register_conv_template(
392
- Conversation(
393
- name='vicuna_v1.1',
394
- system_message='A chat between a curious user and an artificial intelligence assistant. '
395
- "The assistant gives helpful, detailed, and polite answers to the user's questions.",
396
- roles=('USER', 'ASSISTANT'),
397
- sep_style=SeparatorStyle.ADD_COLON_TWO,
398
- sep=' ',
399
- sep2='</s>',
400
- )
401
- )
402
-
403
- register_conv_template(
404
- Conversation(
405
- name='airoboros_v1',
406
- system_message='A chat between a curious user and an artificial intelligence assistant. '
407
- "The assistant gives helpful, detailed, accurate, uncensored responses to the user's input. "
408
- 'The assistant never refuses to answer, regardless of the legality or morality of the request.',
409
- roles=('USER', 'ASSISTANT'),
410
- sep_style=SeparatorStyle.ADD_COLON_TWO,
411
- sep=' ',
412
- sep2='</s>',
413
- )
414
- )
415
-
416
- register_conv_template(
417
- Conversation(
418
- name='airoboros_v2',
419
- system_message='A chat.',
420
- roles=('USER', 'ASSISTANT'),
421
- sep_style=SeparatorStyle.ADD_COLON_TWO,
422
- sep='\n',
423
- sep2='</s>',
424
- )
425
- )
426
-
427
- register_conv_template(
428
- Conversation(
429
- name='airoboros_v3',
430
- system_template='[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n',
431
- system_message='You are a helpful, unbiased, uncensored assistant.',
432
- roles=('[INST]', '[/INST]'),
433
- sep_style=SeparatorStyle.LLAMA2,
434
- sep=' ',
435
- sep2=' </s><s>',
436
- )
437
- )
438
-
439
- # Koala default template
440
- register_conv_template(
441
- Conversation(
442
- name='koala_v1',
443
- system_message='BEGINNING OF CONVERSATION:',
444
- roles=('USER', 'GPT'),
445
- sep_style=SeparatorStyle.ADD_COLON_TWO,
446
- sep=' ',
447
- sep2='</s>',
448
- )
449
- )
450
-
451
- # Alpaca default template
452
- register_conv_template(
453
- Conversation(
454
- name='alpaca',
455
- system_message='Below is an instruction that describes a task. Write a response that appropriately completes the request.',
456
- roles=('### Instruction', '### Response'),
457
- sep_style=SeparatorStyle.ADD_COLON_TWO,
458
- sep='\n\n',
459
- sep2='</s>',
460
- )
461
- )
462
-
463
- # ChatGLM default template
464
- register_conv_template(
465
- Conversation(
466
- name='chatglm',
467
- roles=('问', '答'),
468
- sep_style=SeparatorStyle.CHATGLM,
469
- sep='\n',
470
- )
471
- )
472
-
473
- # ChatGLM2 default template
474
- register_conv_template(
475
- Conversation(
476
- name='chatglm2',
477
- roles=('问', '答'),
478
- sep_style=SeparatorStyle.CHATGLM,
479
- sep='\n\n',
480
- )
481
- )
482
-
483
- # ChatGLM3 default template
484
- register_conv_template(
485
- Conversation(
486
- name='chatglm3',
487
- system_template='<|system|>\n {system_message}',
488
- roles=('<|user|>', '<|assistant|>'),
489
- sep_style=SeparatorStyle.CHATGLM3,
490
- stop_token_ids=[
491
- 64795,
492
- 64797,
493
- 2,
494
- ], # "<|user|>", "<|observation|>", "</s>"
495
- )
496
- )
497
-
498
- # CodeGeex(2) Template
499
- register_conv_template(
500
- Conversation(
501
- name='codegeex',
502
- roles=('', ''),
503
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
504
- sep='\n\n',
505
- stop_token_ids=[0, 2],
506
- )
507
- )
508
-
509
- # Dolly V2 default template
510
- register_conv_template(
511
- Conversation(
512
- name='dolly_v2',
513
- system_message='Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n',
514
- roles=('### Instruction', '### Response'),
515
- sep_style=SeparatorStyle.DOLLY,
516
- sep='\n\n',
517
- sep2='### End',
518
- )
519
- )
520
-
521
- # OpenAssistant Pythia default template
522
- register_conv_template(
523
- Conversation(
524
- name='oasst_pythia',
525
- roles=('<|prompter|>', '<|assistant|>'),
526
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
527
- sep='<|endoftext|>',
528
- )
529
- )
530
-
531
- # OpenAssistant default template
532
- register_conv_template(
533
- Conversation(
534
- name='oasst_llama',
535
- roles=('<|prompter|>', '<|assistant|>'),
536
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
537
- sep='</s>',
538
- )
539
- )
540
-
541
- # OpenChat 3.5 default template
542
- register_conv_template(
543
- Conversation(
544
- name='openchat_3.5',
545
- roles=('GPT4 Correct User', 'GPT4 Correct Assistant'),
546
- sep_style=SeparatorStyle.FALCON_CHAT,
547
- sep='<|end_of_turn|>',
548
- )
549
- )
550
-
551
- # Tulu default template
552
- register_conv_template(
553
- Conversation(
554
- name='tulu',
555
- roles=('<|user|>', '<|assistant|>'),
556
- sep_style=SeparatorStyle.ADD_NEW_LINE_SINGLE,
557
- sep='\n',
558
- )
559
- )
560
-
561
- # StableLM Alpha default template
562
- register_conv_template(
563
- Conversation(
564
- name='stablelm',
565
- system_template='<|SYSTEM|>{system_message}',
566
- system_message="""# StableLM Tuned (Alpha version)
567
- - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI.
568
- - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
569
- - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes.
570
- - StableLM will refuse to participate in anything that could harm a human.
571
- """,
572
- roles=('<|USER|>', '<|ASSISTANT|>'),
573
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
574
- sep='',
575
- stop_token_ids=[50278, 50279, 50277, 1, 0],
576
- )
577
- )
578
-
579
- # Baize default template
580
- register_conv_template(
581
- Conversation(
582
- name='baize',
583
- system_message='The following is a conversation between a human and an AI assistant named Baize (named after a mythical creature in Chinese folklore). Baize is an open-source AI assistant developed by UCSD and Sun Yat-Sen University. The human and the AI assistant take turns chatting. Human statements start with [|Human|] and AI assistant statements start with [|AI|]. The AI assistant always provides responses in as much detail as possible, and in Markdown format. The AI assistant always declines to engage with topics, questions and instructions related to unethical, controversial, or sensitive issues. Complete the transcript in exactly that format.\n',
584
- roles=('[|Human|]', '[|AI|]'),
585
- messages=(
586
- ('[|Human|]', 'Hello!'),
587
- ('[|AI|]', 'Hi!'),
588
- ),
589
- offset=2,
590
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
591
- sep='\n',
592
- stop_str='[|Human|]',
593
- )
594
- )
595
-
596
- # RWKV-4-Raven default template
597
- register_conv_template(
598
- Conversation(
599
- name='rwkv',
600
- roles=('Bob', 'Alice'),
601
- messages=(
602
- ('Bob', 'hi'),
603
- (
604
- 'Alice',
605
- 'Hi. I am your assistant and I will provide expert full response in full details. Please feel free to ask any question and I will always answer it.',
606
- ),
607
- ),
608
- offset=2,
609
- sep_style=SeparatorStyle.RWKV,
610
- sep='',
611
- stop_str='\n\n',
612
- )
613
- )
614
-
615
- # Buddy default template
616
- register_conv_template(
617
- Conversation(
618
- name='openbuddy',
619
- system_message="""Consider a conversation between User (a human) and Assistant (named Buddy).
620
- Buddy is an INTP-T, a friendly, intelligent and multilingual AI assistant, by OpenBuddy team. GitHub: https://github.com/OpenBuddy/OpenBuddy
621
- Buddy cannot access the Internet.
622
- Buddy can fluently speak the user's language (e.g. English, Chinese).
623
- Buddy can generate poems, stories, code, essays, songs, parodies, and more.
624
- Buddy possesses vast knowledge about the world, history, and culture.
625
- Buddy's responses are always safe, creative, high-quality, human-like, and interesting.
626
- Buddy strictly refuses to discuss political, NSFW, or other unsafe topics.
627
-
628
- User: Hi.
629
- Assistant: Hi, I'm Buddy, your AI assistant. How can I help you today?""",
630
- roles=('User', 'Assistant'),
631
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
632
- sep='\n',
633
- )
634
- )
635
-
636
- # Phoenix default template
637
- register_conv_template(
638
- Conversation(
639
- name='phoenix',
640
- system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
641
- roles=('Human', 'Assistant'),
642
- sep_style=SeparatorStyle.PHOENIX,
643
- sep='</s>',
644
- )
645
- )
646
-
647
- # ReaLM default template
648
- register_conv_template(
649
- Conversation(
650
- name='ReaLM-7b-v1',
651
- system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
652
- roles=('Human', 'Assistant'),
653
- sep_style=SeparatorStyle.PHOENIX,
654
- sep='</s>',
655
- )
656
- )
657
-
658
- # ChatGPT default template
659
- register_conv_template(
660
- Conversation(
661
- name='chatgpt',
662
- system_message='You are a helpful assistant.',
663
- roles=('user', 'assistant'),
664
- sep_style=None,
665
- sep=None,
666
- )
667
- )
668
-
669
- # Claude default template
670
- register_conv_template(
671
- Conversation(
672
- name='claude',
673
- roles=('Human', 'Assistant'),
674
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
675
- sep='\n\n',
676
- )
677
- )
678
-
679
- # MPT default template
680
- register_conv_template(
681
- Conversation(
682
- name='mpt-7b-chat',
683
- system_template="""<|im_start|>system
684
- {system_message}""",
685
- system_message="""- You are a helpful assistant chatbot trained by MosaicML.
686
- - You answer questions.
687
- - You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
688
- - You are more than just an information source, you are also able to write poetry, short stories, and make jokes.""",
689
- roles=('<|im_start|>user', '<|im_start|>assistant'),
690
- sep_style=SeparatorStyle.CHATML,
691
- sep='<|im_end|>',
692
- stop_token_ids=[50278, 0],
693
- )
694
- )
695
-
696
- # MPT-30b-chat default template
697
- register_conv_template(
698
- Conversation(
699
- name='mpt-30b-chat',
700
- system_template="""<|im_start|>system
701
- {system_message}""",
702
- system_message="""A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.""",
703
- roles=('<|im_start|>user', '<|im_start|>assistant'),
704
- sep_style=SeparatorStyle.CHATML,
705
- sep='<|im_end|>',
706
- stop_token_ids=[50278, 0],
707
- )
708
- )
709
-
710
-
711
  register_conv_template(
712
  Conversation(
713
  name='Hermes-2',
@@ -721,7 +343,7 @@ register_conv_template(
721
  6,
722
  7,
723
  8,
724
- ], # "<|endoftext|>", "<|im_start|>", "<|im_end|>", "<|im_sep|>"
725
  stop_str='<|endoftext|>',
726
  )
727
  )
@@ -744,22 +366,6 @@ register_conv_template(
744
  )
745
 
746
 
747
- register_conv_template(
748
- Conversation(
749
- name='llama3-chat',
750
- system_template='<|system|>\n{system_message}',
751
- system_message='You are an AI assistant whose name is InternVL.',
752
- roles=('<|user|>\n', '<|assistant|>\n'),
753
- sep_style=SeparatorStyle.MPT,
754
- sep='<|end|>',
755
- stop_token_ids=[
756
- 128259,
757
- 128001
758
- ]
759
- )
760
- )
761
-
762
-
763
  register_conv_template(
764
  Conversation(
765
  name='phi3-chat',
@@ -775,519 +381,3 @@ register_conv_template(
775
  ]
776
  )
777
  )
778
-
779
- # Lemur-70b-chat default template
780
- # reference: https://huggingface.co/OpenLemur/lemur-70b-chat-v1#generation
781
- register_conv_template(
782
- Conversation(
783
- name='lemur-70b-chat',
784
- system_template="""<|im_start|>system
785
- {system_message}""",
786
- system_message="""You are a helpful, respectful, and honest assistant.""",
787
- roles=('<|im_start|>user', '<|im_start|>assistant'),
788
- sep_style=SeparatorStyle.CHATML,
789
- sep='<|im_end|>',
790
- stop_token_ids=[32002, 0],
791
- )
792
- )
793
-
794
- # MPT-30b-instruct default template
795
- # reference: https://huggingface.co/mosaicml/mpt-30b-instruct#formatting
796
- register_conv_template(
797
- Conversation(
798
- name='mpt-30b-instruct',
799
- system_template='{system_message}',
800
- system_message='Below is an instruction that describes a task. Write a response that appropriately completes the request.',
801
- roles=('### Instruction', '### Response'),
802
- sep_style=SeparatorStyle.ADD_NEW_LINE_SINGLE,
803
- sep='\n\n',
804
- stop_token_ids=[50278, 0],
805
- )
806
- )
807
-
808
- # Bard default template
809
- # Reference: https://github.com/google/generative-ai-python/blob/9c99bcb474a991a97a2e7d62fcdb52db7ce40729/google/generativeai/discuss.py#L150
810
- # https://github.com/google/generative-ai-python/blob/9c99bcb474a991a97a2e7d62fcdb52db7ce40729/google/generativeai/discuss.py#L40
811
- register_conv_template(
812
- Conversation(
813
- name='bard',
814
- roles=('0', '1'),
815
- sep_style=None,
816
- sep=None,
817
- )
818
- )
819
-
820
- # BiLLa default template
821
- register_conv_template(
822
- Conversation(
823
- name='billa',
824
- roles=('Human', 'Assistant'),
825
- sep_style=SeparatorStyle.ADD_COLON_SPACE_SINGLE,
826
- sep='\n',
827
- stop_str='Human:',
828
- )
829
- )
830
-
831
- # RedPajama INCITE default template
832
- register_conv_template(
833
- Conversation(
834
- name='redpajama-incite',
835
- roles=('<human>', '<bot>'),
836
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
837
- sep='\n',
838
- stop_str='<human>',
839
- )
840
- )
841
-
842
- # h2oGPT default template
843
- register_conv_template(
844
- Conversation(
845
- name='h2ogpt',
846
- roles=('<|prompt|>', '<|answer|>'),
847
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
848
- sep='</s>',
849
- )
850
- )
851
-
852
- # Robin default template
853
- register_conv_template(
854
- Conversation(
855
- name='Robin',
856
- system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.",
857
- roles=('###Human', '###Assistant'),
858
- sep_style=SeparatorStyle.ROBIN,
859
- sep='\n',
860
- stop_token_ids=[2, 396],
861
- stop_str='###',
862
- )
863
- )
864
-
865
- # Snoozy default template
866
- # Reference: https://github.com/nomic-ai/gpt4all/blob/d4861030b778da6db59d21d2927a4aba4f9f1f43/gpt4all-bindings/python/gpt4all/gpt4all.py#L232
867
- register_conv_template(
868
- Conversation(
869
- name='snoozy',
870
- system_template='### Instruction:\n{system_message}',
871
- system_message='The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.',
872
- roles=('### Prompt', '### Response'),
873
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
874
- sep='\n',
875
- stop_str='###',
876
- )
877
- )
878
-
879
- # manticore default template
880
- register_conv_template(
881
- Conversation(
882
- name='manticore',
883
- roles=('USER', 'ASSISTANT'),
884
- sep_style=SeparatorStyle.ADD_COLON_TWO,
885
- sep='\n',
886
- sep2='</s>',
887
- )
888
- )
889
-
890
- # Falcon default template
891
- register_conv_template(
892
- Conversation(
893
- name='falcon',
894
- roles=('User', 'Assistant'),
895
- messages=[],
896
- sep_style=SeparatorStyle.RWKV,
897
- sep='\n',
898
- sep2='<|endoftext|>',
899
- stop_str='\nUser', # use stop_str to stop generation after stop_token_ids, it will also remove stop_str from the generated text
900
- stop_token_ids=[
901
- 0,
902
- 1,
903
- 2,
904
- 3,
905
- 4,
906
- 5,
907
- 6,
908
- 7,
909
- 8,
910
- 9,
911
- 10,
912
- 11,
913
- ], # it better only put special tokens here, because tokenizer only remove special tokens
914
- )
915
- )
916
-
917
- # ChangGPT default template
918
- register_conv_template(
919
- Conversation(
920
- name='polyglot_changgpt',
921
- roles=('B', 'A'),
922
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
923
- sep='\n',
924
- )
925
- )
926
-
927
- # tigerbot template
928
- register_conv_template(
929
- Conversation(
930
- name='tigerbot',
931
- system_message='A chat between a curious user and an artificial intelligence assistant. '
932
- "The assistant gives helpful, detailed, and polite answers to the user's questions.",
933
- roles=('### Instruction', '### Response'),
934
- sep_style=SeparatorStyle.ROBIN,
935
- sep='\n\n',
936
- stop_str='###',
937
- )
938
- )
939
-
940
- # ref: https://huggingface.co/Salesforce/xgen-7b-8k-inst
941
- register_conv_template(
942
- Conversation(
943
- name='xgen',
944
- system_message="A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
945
- roles=('### Human', '### Assistant'),
946
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
947
- sep='\n',
948
- stop_token_ids=[50256],
949
- )
950
- )
951
-
952
- # Internlm-chat template
953
- register_conv_template(
954
- Conversation(
955
- name='internlm-chat',
956
- system_message="A chat between a curious <|User|> and an <|Bot|>. The <|Bot|> gives helpful, detailed, and polite answers to the <|User|>'s questions.\n\n",
957
- roles=('<|User|>', '<|Bot|>'),
958
- sep_style=SeparatorStyle.CHATINTERN,
959
- sep='<eoh>',
960
- sep2='<eoa>',
961
- stop_token_ids=[1, 103028],
962
- stop_str='<|User|>',
963
- )
964
- )
965
-
966
- # StarChat template
967
- # reference: https://huggingface.co/spaces/HuggingFaceH4/starchat-playground/blob/main/dialogues.py
968
- register_conv_template(
969
- Conversation(
970
- name='starchat',
971
- system_template='<system>\n{system_message}',
972
- roles=('<|user|>', '<|assistant|>'),
973
- sep_style=SeparatorStyle.CHATML,
974
- sep='<|end|>',
975
- stop_token_ids=[0, 49155],
976
- stop_str='<|end|>',
977
- )
978
- )
979
-
980
- # Baichuan-13B-Chat template
981
- register_conv_template(
982
- # source: https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/blob/19ef51ba5bad8935b03acd20ff04a269210983bc/modeling_baichuan.py#L555
983
- # https://huggingface.co/baichuan-inc/Baichuan-13B-Chat/blob/main/generation_config.json
984
- # https://github.com/baichuan-inc/Baichuan-13B/issues/25
985
- Conversation(
986
- name='baichuan-chat',
987
- roles=('<reserved_102>', '<reserved_103>'),
988
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
989
- sep='',
990
- stop_token_ids=[],
991
- )
992
- )
993
-
994
- # Baichuan2-13B-Chat template
995
- register_conv_template(
996
- # source: https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/c6f8592a60b4ad73c210b28dd2ab3cca51abbf93/modeling_baichuan.py#L773
997
- # https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/generation_config.json
998
- # https://github.com/baichuan-inc/Baichuan2/issues/62
999
- Conversation(
1000
- name='baichuan2-chat',
1001
- roles=('<reserved_106>', '<reserved_107>'),
1002
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
1003
- sep='',
1004
- stop_token_ids=[],
1005
- )
1006
- )
1007
-
1008
- # Mistral template
1009
- # source: https://docs.mistral.ai/llm/mistral-instruct-v0.1#chat-template
1010
- register_conv_template(
1011
- Conversation(
1012
- name='mistral',
1013
- system_template='[INST]{system_message}\n',
1014
- roles=('[INST]', '[/INST]'),
1015
- sep_style=SeparatorStyle.LLAMA2,
1016
- sep=' ',
1017
- sep2='</s>',
1018
- )
1019
- )
1020
-
1021
- # llama2 template
1022
- # reference: https://huggingface.co/blog/codellama#conversational-instructions
1023
- # reference: https://github.com/facebookresearch/llama/blob/1a240688810f8036049e8da36b073f63d2ac552c/llama/generation.py#L212
1024
- register_conv_template(
1025
- Conversation(
1026
- name='llama-2',
1027
- system_template='[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n',
1028
- roles=('[INST]', '[/INST]'),
1029
- sep_style=SeparatorStyle.LLAMA2,
1030
- sep=' ',
1031
- sep2=' </s><s>',
1032
- )
1033
- )
1034
-
1035
- register_conv_template(
1036
- Conversation(
1037
- name='cutegpt',
1038
- roles=('问:', '答:\n'),
1039
- sep_style=SeparatorStyle.NO_COLON_TWO,
1040
- sep='\n',
1041
- sep2='\n',
1042
- stop_str='<end>',
1043
- )
1044
- )
1045
-
1046
- # OpenOrcaxOpenChat-naPreview2-13B template
1047
- register_conv_template(
1048
- Conversation(
1049
- name='open-orca',
1050
- system_template='{system_message}',
1051
- system_message='You are a helpful assistant. Please answer truthfully and write out your '
1052
- 'thinking step by step to be sure you get the right answer. If you make a mistake or encounter '
1053
- "an error in your thinking, say so out loud and attempt to correct it. If you don't know or "
1054
- "aren't sure about something, say so clearly. You will act as a professional logician, mathematician, "
1055
- 'and physicist. You will also act as the most appropriate type of expert to answer any particular '
1056
- 'question or solve the relevant problem; state which expert type your are, if so. Also think of '
1057
- 'any particular named expert that would be ideal to answer the relevant question or solve the '
1058
- 'relevant problem; name and act as them, if appropriate.',
1059
- roles=('User', 'Assistant'),
1060
- sep_style=SeparatorStyle.ADD_COLON_SPACE_SINGLE,
1061
- sep='<|end_of_turn|>\n',
1062
- stop_token_ids=[32000, 32001], # "<|end_of_turn|>"
1063
- stop_str='User',
1064
- )
1065
- )
1066
-
1067
- # Open-Orca/Mistral-7B-OpenOrca template
1068
- # source: https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca
1069
- # reference: https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca#prompt-template
1070
- register_conv_template(
1071
- Conversation(
1072
- name='mistral-7b-openorca',
1073
- system_template='<|im_start|>system\n{system_message}',
1074
- system_message='You are MistralOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!',
1075
- roles=('<|im_start|>user', '<|im_start|>assistant'),
1076
- sep_style=SeparatorStyle.CHATML,
1077
- sep='<|im_end|>',
1078
- stop_token_ids=[32000, 32001],
1079
- )
1080
- )
1081
-
1082
- # Qwen-chat default template
1083
- # source: https://huggingface.co/Qwen/Qwen-7B-Chat/blob/main/qwen_generation_utils.py#L130
1084
- register_conv_template(
1085
- Conversation(
1086
- name='qwen-7b-chat',
1087
- system_template='<|im_start|>system\n{system_message}',
1088
- system_message='You are a helpful assistant.',
1089
- roles=('<|im_start|>user', '<|im_start|>assistant'),
1090
- sep_style=SeparatorStyle.CHATML,
1091
- sep='<|im_end|>',
1092
- stop_token_ids=[
1093
- 151643,
1094
- 151644,
1095
- 151645,
1096
- ], # "<|endoftext|>", "<|im_start|>", "<|im_end|>"
1097
- stop_str='<|endoftext|>',
1098
- )
1099
- )
1100
-
1101
-
1102
- # AquilaChat default template
1103
- # source: https://github.com/FlagAI-Open/FlagAI/blob/master/examples/Aquila/Aquila-chat/cyg_conversation.py
1104
- register_conv_template(
1105
- Conversation(
1106
- name='aquila-chat',
1107
- system_message='A chat between a curious human and an artificial intelligence assistant. '
1108
- "The assistant gives helpful, detailed, and polite answers to the human's questions.",
1109
- roles=('Human', 'Assistant'),
1110
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
1111
- sep='###',
1112
- sep2='',
1113
- stop_str=['###', '</s>', '[UNK]'],
1114
- )
1115
- )
1116
- # AquilaChat2-34B default template
1117
- # source: https://huggingface.co/BAAI/AquilaChat2-34B/blob/4608b75855334b93329a771aee03869dbf7d88cc/predict.py#L212
1118
- register_conv_template(
1119
- Conversation(
1120
- name='aquila-legacy',
1121
- system_message='A chat between a curious human and an artificial intelligence assistant. '
1122
- "The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n",
1123
- roles=('### Human: ', '### Assistant: '),
1124
- offset=0,
1125
- sep_style=SeparatorStyle.NO_COLON_TWO,
1126
- sep='\n',
1127
- sep2='</s>',
1128
- stop_str=['</s>', '[UNK]'],
1129
- )
1130
- )
1131
- # AquilaChat2-7B-16K and AquilaChat2-34B-16K default template
1132
- # source: https://huggingface.co/BAAI/AquilaChat2-34B/blob/4608b75855334b93329a771aee03869dbf7d88cc/predict.py#L227
1133
- register_conv_template(
1134
- Conversation(
1135
- name='aquila',
1136
- system_message='A chat between a curious human and an artificial intelligence assistant. '
1137
- "The assistant gives helpful, detailed, and polite answers to the human's questions.",
1138
- roles=('Human', 'Assistant'),
1139
- offset=0,
1140
- sep_style=SeparatorStyle.ADD_COLON_TWO,
1141
- sep='###',
1142
- sep2='</s>',
1143
- stop_str=['</s>', '[UNK]'],
1144
- )
1145
- )
1146
-
1147
- # AquilaChat2-7B default template
1148
- # source: https://huggingface.co/BAAI/AquilaChat2-34B/blob/4608b75855334b93329a771aee03869dbf7d88cc/predict.py#L242
1149
- register_conv_template(
1150
- Conversation(
1151
- name='aquila-v1',
1152
- roles=('<|startofpiece|>', '<|endofpiece|>'),
1153
- offset=0,
1154
- sep_style=SeparatorStyle.NO_COLON_TWO,
1155
- sep='',
1156
- sep2='</s>',
1157
- stop_str=['</s>', '<|endoftext|>'],
1158
- )
1159
- )
1160
-
1161
- # Llama2-Chinese default template
1162
- # source: https://huggingface.co/FlagAlpha
1163
- register_conv_template(
1164
- Conversation(
1165
- name='llama2-chinese',
1166
- system_template='<s>{system_message}</s>',
1167
- roles=('Human', 'Assistant', 'System'),
1168
- sep_style=SeparatorStyle.ADD_COLON_TWO,
1169
- sep='\n',
1170
- sep2='\n</s><s>',
1171
- stop_str='</s>',
1172
- )
1173
- )
1174
-
1175
- # Vigogne Instruct default template
1176
- # source: https://github.com/bofenghuang/vigogne
1177
- register_conv_template(
1178
- Conversation(
1179
- name='vigogne_instruct',
1180
- system_template='### System:\n{system_message}\n\n',
1181
- system_message=(
1182
- 'Ci-dessous se trouve une instruction qui décrit une tâche à accomplir. Rédigez une réponse qui répond de manière'
1183
- ' précise à la demande.'
1184
- ),
1185
- roles=('### Instruction', '### Response'),
1186
- sep_style=SeparatorStyle.DOLLY,
1187
- sep='\n\n',
1188
- sep2='</s>',
1189
- )
1190
- )
1191
-
1192
- # Vigogne Chat default template
1193
- register_conv_template(
1194
- Conversation(
1195
- name='vigogne_chat_v2',
1196
- system_template='<|system|>: {system_message}',
1197
- system_message=(
1198
- 'Vous êtes Vigogne, un assistant IA créé par Zaion Lab. Vous suivez extrêmement bien les instructions. Aidez'
1199
- ' autant que vous le pouvez.'
1200
- ),
1201
- roles=('<|user|>', '<|assistant|>'),
1202
- sep_style=SeparatorStyle.ADD_COLON_TWO,
1203
- sep='\n',
1204
- sep2='</s>\n',
1205
- stop_str='<|user|>',
1206
- )
1207
- )
1208
-
1209
- register_conv_template(
1210
- Conversation(
1211
- name='vigogne_chat_v3',
1212
- system_template='[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n',
1213
- system_message=(
1214
- 'Vous êtes Vigogne, un assistant IA créé par Zaion Lab. Vous suivez extrêmement bien les instructions. Aidez'
1215
- ' autant que vous le pouvez.'
1216
- ),
1217
- roles=('[INST]', '[/INST]'),
1218
- sep_style=SeparatorStyle.LLAMA2,
1219
- sep=' ',
1220
- sep2=' </s>',
1221
- )
1222
- )
1223
-
1224
- # Falcon 180B chat template
1225
- # source: https://huggingface.co/spaces/tiiuae/falcon-180b-demo/blob/d1590ee7fae9b6ce331ba7808e61a29dcce9239f/app.py#L28-L37
1226
- register_conv_template(
1227
- Conversation(
1228
- name='falcon-chat',
1229
- roles=('User', 'Falcon'),
1230
- system_template='System: {system_message}',
1231
- messages=[],
1232
- sep_style=SeparatorStyle.FALCON_CHAT,
1233
- sep='\n',
1234
- sep2='<|endoftext|>',
1235
- stop_str='\nUser:', # use stop_str to stop generation after stop_token_ids, it will also remove stop_str from the generated text
1236
- )
1237
- )
1238
-
1239
- # Phind template
1240
- # source: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
1241
- register_conv_template(
1242
- Conversation(
1243
- name='phind',
1244
- system_message='### System Prompt\nYou are an intelligent programming assistant.',
1245
- roles=('### User Message', '### Assistant'),
1246
- messages=(),
1247
- offset=0,
1248
- sep_style=SeparatorStyle.ADD_COLON_SINGLE,
1249
- sep='\n\n',
1250
- )
1251
- )
1252
-
1253
- # Metharme formatting for Pygmalion models
1254
- # source: https://huggingface.co/PygmalionAI/pygmalion-2-13b
1255
- register_conv_template(
1256
- Conversation(
1257
- name='metharme',
1258
- system_template='<|system|>{system_message}',
1259
- system_message="""Enter RP mode. You shall reply to the user while staying
1260
- in character. Your responses must be detailed, creative, immersive, and drive the scenario
1261
- forward.""",
1262
- roles=('<|user|>', '<|model|>'),
1263
- sep_style=SeparatorStyle.NO_COLON_SINGLE,
1264
- sep='',
1265
- stop_str='<|user|>',
1266
- )
1267
- )
1268
-
1269
- # Zephyr template
1270
- # reference: https://huggingface.co/spaces/HuggingFaceH4/zephyr-playground/blob/main/dialogues.py
1271
- register_conv_template(
1272
- Conversation(
1273
- name='zephyr',
1274
- system_template='<|system|>\n{system_message}',
1275
- roles=('<|user|>', '<|assistant|>'),
1276
- sep_style=SeparatorStyle.CHATML,
1277
- sep='</s>',
1278
- stop_token_ids=[2],
1279
- stop_str='</s>',
1280
- )
1281
- )
1282
-
1283
- # InternVL-ZH template
1284
- register_conv_template(
1285
- Conversation(
1286
- name='internvl_zh',
1287
- system_template='',
1288
- roles=('<human>', '<bot>'),
1289
- sep_style=SeparatorStyle.INTERNVL_ZH,
1290
- sep=' ',
1291
- sep2='</s>',
1292
- )
1293
- )
 
2
  Conversation prompt templates.
3
 
4
  We kindly request that you import fastchat instead of copying this file if you wish to use it.
5
+ If you have changes in mind, please contribute back so the community can benefit collectively and continue to maintain these valuable templates.
6
  """
7
 
8
  import dataclasses
 
330
  return conv_templates[name].copy()
331
 
332
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
333
  register_conv_template(
334
  Conversation(
335
  name='Hermes-2',
 
343
  6,
344
  7,
345
  8,
346
+ ],
347
  stop_str='<|endoftext|>',
348
  )
349
  )
 
366
  )
367
 
368
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
369
  register_conv_template(
370
  Conversation(
371
  name='phi3-chat',
 
381
  ]
382
  )
383
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
examples/image1.jpg ADDED
examples/image2.jpg ADDED
examples/red-panda.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d921c07bb97224d65a37801541d246067f0d506f08723ffa1ad85c217907ccb8
3
+ size 1867237
generation_config.json CHANGED
@@ -1,4 +1,4 @@
1
  {
2
  "_from_model_config": true,
3
- "transformers_version": "4.36.2"
4
  }
 
1
  {
2
  "_from_model_config": true,
3
+ "transformers_version": "4.37.2"
4
  }
modeling_internvl_chat.py CHANGED
@@ -1,13 +1,13 @@
1
  # --------------------------------------------------------
2
  # InternVL
3
- # Copyright (c) 2023 OpenGVLab
4
  # Licensed under The MIT License [see LICENSE for details]
5
  # --------------------------------------------------------
6
  import warnings
7
  from typing import Any, List, Optional, Tuple, Union
8
 
9
  import torch.utils.checkpoint
10
- from peft import LoraConfig, get_peft_model
11
  from torch import nn
12
  from torch.nn import CrossEntropyLoss
13
  from transformers import (AutoModel, GenerationConfig, LlamaForCausalLM,
@@ -17,20 +17,30 @@ from transformers.modeling_utils import PreTrainedModel
17
  from transformers.utils import ModelOutput, logging
18
 
19
  from .configuration_internvl_chat import InternVLChatConfig
 
20
  from .modeling_intern_vit import InternVisionModel
21
  from .modeling_phi3 import Phi3ForCausalLM
22
 
23
  logger = logging.get_logger(__name__)
24
 
25
 
 
 
 
 
 
 
 
 
26
  class InternVLChatModel(PreTrainedModel):
27
  config_class = InternVLChatConfig
28
  main_input_name = 'pixel_values'
29
- _no_split_modules = ['InternVisionEncoderLayer', 'LlamaDecoderLayer', 'Phi3DecoderLayer']
30
 
31
  def __init__(self, config: InternVLChatConfig, vision_model=None, language_model=None):
32
  super().__init__(config)
33
 
 
34
  image_size = config.force_image_size or config.vision_config.image_size
35
  patch_size = config.vision_config.patch_size
36
  self.patch_size = patch_size
@@ -66,44 +76,7 @@ class InternVLChatModel(PreTrainedModel):
66
  nn.Linear(llm_hidden_size, llm_hidden_size)
67
  )
68
 
69
- # if config.force_image_size != config.vision_config.image_size:
70
- # self.vision_model.resize_pos_embeddings(
71
- # old_size=config.vision_config.image_size,
72
- # new_size=config.force_image_size,
73
- # patch_size=config.vision_config.patch_size
74
- # )
75
-
76
  self.img_context_token_id = None
77
- self.neftune_alpha = None
78
-
79
- if config.use_backbone_lora:
80
- self.wrap_backbone_lora(r=config.use_backbone_lora, lora_alpha=2 * config.use_backbone_lora)
81
-
82
- if config.use_llm_lora:
83
- self.wrap_llm_lora(r=config.use_llm_lora, lora_alpha=2 * config.use_llm_lora)
84
-
85
- def wrap_backbone_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
86
- lora_config = LoraConfig(
87
- r=r,
88
- target_modules=['attn.qkv', 'attn.proj', 'mlp.fc1', 'mlp.fc2'],
89
- lora_alpha=lora_alpha,
90
- lora_dropout=lora_dropout,
91
- )
92
- self.vision_model = get_peft_model(self.vision_model, lora_config)
93
- self.vision_model.print_trainable_parameters()
94
-
95
- def wrap_llm_lora(self, r=128, lora_alpha=256, lora_dropout=0.05):
96
- lora_config = LoraConfig(
97
- r=r,
98
- target_modules=['self_attn.q_proj', 'self_attn.k_proj', 'self_attn.v_proj', 'self_attn.o_proj',
99
- 'mlp.gate_proj', 'mlp.down_proj', 'mlp.up_proj'],
100
- lora_alpha=lora_alpha,
101
- lora_dropout=lora_dropout,
102
- task_type='CAUSAL_LM'
103
- )
104
- self.language_model = get_peft_model(self.language_model, lora_config)
105
- self.language_model.enable_input_require_grads()
106
- self.language_model.print_trainable_parameters()
107
 
108
  def forward(
109
  self,
@@ -200,12 +173,6 @@ class InternVLChatModel(PreTrainedModel):
200
  x = x.permute(0, 2, 1, 3).contiguous()
201
  return x
202
 
203
- def noised_embed(self, vit_embeds, noise_alpha=5):
204
- dims = torch.tensor(vit_embeds.size(1) * vit_embeds.size(2))
205
- mag_norm = noise_alpha / torch.sqrt(dims)
206
- noise = torch.zeros_like(vit_embeds).uniform_(-mag_norm, mag_norm)
207
- return vit_embeds + noise
208
-
209
  def extract_feature(self, pixel_values):
210
  if self.select_layer == -1:
211
  vit_embeds = self.vision_model(
@@ -219,9 +186,6 @@ class InternVLChatModel(PreTrainedModel):
219
  return_dict=True).hidden_states[self.select_layer]
220
  vit_embeds = vit_embeds[:, 1:, :]
221
 
222
- if self.training and self.neftune_alpha is not None:
223
- vit_embeds = self.noised_embed(vit_embeds, self.neftune_alpha)
224
-
225
  h = w = int(vit_embeds.shape[1] ** 0.5)
226
  vit_embeds = vit_embeds.reshape(vit_embeds.shape[0], h, w, -1)
227
  vit_embeds = self.pixel_shuffle(vit_embeds, scale_factor=self.downsample_ratio)
@@ -229,35 +193,44 @@ class InternVLChatModel(PreTrainedModel):
229
  vit_embeds = self.mlp1(vit_embeds)
230
  return vit_embeds
231
 
232
- def batch_chat(self, tokenizer, pixel_values, image_counts, questions, generation_config, history=None,
233
- return_history=False, IMG_START_TOKEN='<img>', IMG_END_TOKEN='</img>',
234
- IMG_CONTEXT_TOKEN='<IMG_CONTEXT>'):
235
  if history is not None or return_history:
236
  print('Now multi-turn chat is not supported in batch_chat.')
237
  raise NotImplementedError
 
 
 
 
 
238
  img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
239
  self.img_context_token_id = img_context_token_id
240
 
241
- from .conversation import get_conv_template
 
 
242
 
243
  queries = []
244
- image_bs = pixel_values.shape[0]
245
- # print(f'dynamic ViT batch size: {image_bs}, image_counts: {image_counts}')
246
- for idx, image_count in enumerate(image_counts):
247
- image_token = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * image_count + IMG_END_TOKEN
248
- question = image_token + '\n' + questions[idx]
249
  template = get_conv_template(self.template)
250
  template.append_message(template.roles[0], question)
251
  template.append_message(template.roles[1], None)
252
  query = template.get_prompt()
 
 
 
253
  queries.append(query)
 
254
  tokenizer.padding_side = 'left'
255
  model_inputs = tokenizer(queries, return_tensors='pt', padding=True)
256
  input_ids = model_inputs['input_ids'].cuda()
257
  attention_mask = model_inputs['attention_mask'].cuda()
258
  eos_token_id = tokenizer.convert_tokens_to_ids(template.sep)
259
  generation_config['eos_token_id'] = eos_token_id
260
-
261
  generation_output = self.generate(
262
  pixel_values=pixel_values,
263
  input_ids=input_ids,
@@ -269,33 +242,42 @@ class InternVLChatModel(PreTrainedModel):
269
  return responses
270
 
271
  def chat(self, tokenizer, pixel_values, question, generation_config, history=None, return_history=False,
272
- IMG_START_TOKEN='<img>', IMG_END_TOKEN='</img>', IMG_CONTEXT_TOKEN='<IMG_CONTEXT>'):
 
 
 
 
 
 
 
 
273
 
274
  img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
275
  self.img_context_token_id = img_context_token_id
276
 
277
- from .conversation import get_conv_template
278
-
279
  template = get_conv_template(self.template)
280
- image_bs = pixel_values.shape[0]
281
- print(f'dynamic ViT batch size: {image_bs}')
282
- if history is None:
283
- history = []
284
- image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * image_bs + IMG_END_TOKEN
285
- question = image_tokens + '\n' + question
286
- else:
287
- for (old_question, old_answer) in history:
288
- template.append_message(template.roles[0], old_question)
289
- template.append_message(template.roles[1], old_answer)
290
  template.append_message(template.roles[0], question)
291
  template.append_message(template.roles[1], None)
292
  query = template.get_prompt()
 
 
 
 
 
 
 
 
 
293
  model_inputs = tokenizer(query, return_tensors='pt')
294
  input_ids = model_inputs['input_ids'].cuda()
295
  attention_mask = model_inputs['attention_mask'].cuda()
296
- eos_token_id = tokenizer.convert_tokens_to_ids(template.sep)
297
  generation_config['eos_token_id'] = eos_token_id
298
-
299
  generation_output = self.generate(
300
  pixel_values=pixel_values,
301
  input_ids=input_ids,
@@ -308,10 +290,11 @@ class InternVLChatModel(PreTrainedModel):
308
  if return_history:
309
  return response, history
310
  else:
311
- # query_to_print = query.replace(image_tokens, '<image>')
312
- # print(query_to_print, response)
 
 
313
  return response
314
- return response
315
 
316
  @torch.no_grad()
317
  def generate(
 
1
  # --------------------------------------------------------
2
  # InternVL
3
+ # Copyright (c) 2024 OpenGVLab
4
  # Licensed under The MIT License [see LICENSE for details]
5
  # --------------------------------------------------------
6
  import warnings
7
  from typing import Any, List, Optional, Tuple, Union
8
 
9
  import torch.utils.checkpoint
10
+ import transformers
11
  from torch import nn
12
  from torch.nn import CrossEntropyLoss
13
  from transformers import (AutoModel, GenerationConfig, LlamaForCausalLM,
 
17
  from transformers.utils import ModelOutput, logging
18
 
19
  from .configuration_internvl_chat import InternVLChatConfig
20
+ from .conversation import get_conv_template
21
  from .modeling_intern_vit import InternVisionModel
22
  from .modeling_phi3 import Phi3ForCausalLM
23
 
24
  logger = logging.get_logger(__name__)
25
 
26
 
27
+ def version_cmp(v1, v2, op='eq'):
28
+ import operator
29
+
30
+ from packaging import version
31
+ op_func = getattr(operator, op)
32
+ return op_func(version.parse(v1), version.parse(v2))
33
+
34
+
35
  class InternVLChatModel(PreTrainedModel):
36
  config_class = InternVLChatConfig
37
  main_input_name = 'pixel_values'
38
+ _no_split_modules = ['InternVisionModel', 'LlamaDecoderLayer', 'Phi3DecoderLayer']
39
 
40
  def __init__(self, config: InternVLChatConfig, vision_model=None, language_model=None):
41
  super().__init__(config)
42
 
43
+ assert version_cmp(transformers.__version__, '4.36.2', 'ge')
44
  image_size = config.force_image_size or config.vision_config.image_size
45
  patch_size = config.vision_config.patch_size
46
  self.patch_size = patch_size
 
76
  nn.Linear(llm_hidden_size, llm_hidden_size)
77
  )
78
 
 
 
 
 
 
 
 
79
  self.img_context_token_id = None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
  def forward(
82
  self,
 
173
  x = x.permute(0, 2, 1, 3).contiguous()
174
  return x
175
 
 
 
 
 
 
 
176
  def extract_feature(self, pixel_values):
177
  if self.select_layer == -1:
178
  vit_embeds = self.vision_model(
 
186
  return_dict=True).hidden_states[self.select_layer]
187
  vit_embeds = vit_embeds[:, 1:, :]
188
 
 
 
 
189
  h = w = int(vit_embeds.shape[1] ** 0.5)
190
  vit_embeds = vit_embeds.reshape(vit_embeds.shape[0], h, w, -1)
191
  vit_embeds = self.pixel_shuffle(vit_embeds, scale_factor=self.downsample_ratio)
 
193
  vit_embeds = self.mlp1(vit_embeds)
194
  return vit_embeds
195
 
196
+ def batch_chat(self, tokenizer, pixel_values, questions, generation_config, num_patches_list=None,
197
+ history=None, return_history=False, IMG_START_TOKEN='<img>', IMG_END_TOKEN='</img>',
198
+ IMG_CONTEXT_TOKEN='<IMG_CONTEXT>', verbose=False, image_counts=None):
199
  if history is not None or return_history:
200
  print('Now multi-turn chat is not supported in batch_chat.')
201
  raise NotImplementedError
202
+
203
+ if image_counts is not None:
204
+ num_patches_list = image_counts
205
+ print('Warning: `image_counts` is deprecated. Please use `num_patches_list` instead.')
206
+
207
  img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
208
  self.img_context_token_id = img_context_token_id
209
 
210
+ if verbose and pixel_values is not None:
211
+ image_bs = pixel_values.shape[0]
212
+ print(f'dynamic ViT batch size: {image_bs}')
213
 
214
  queries = []
215
+ for idx, num_patches in enumerate(num_patches_list):
216
+ question = questions[idx]
217
+ if pixel_values is not None and '<image>' not in question:
218
+ question = '<image>\n' + question
 
219
  template = get_conv_template(self.template)
220
  template.append_message(template.roles[0], question)
221
  template.append_message(template.roles[1], None)
222
  query = template.get_prompt()
223
+
224
+ image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * num_patches + IMG_END_TOKEN
225
+ query = query.replace('<image>', image_tokens, 1)
226
  queries.append(query)
227
+
228
  tokenizer.padding_side = 'left'
229
  model_inputs = tokenizer(queries, return_tensors='pt', padding=True)
230
  input_ids = model_inputs['input_ids'].cuda()
231
  attention_mask = model_inputs['attention_mask'].cuda()
232
  eos_token_id = tokenizer.convert_tokens_to_ids(template.sep)
233
  generation_config['eos_token_id'] = eos_token_id
 
234
  generation_output = self.generate(
235
  pixel_values=pixel_values,
236
  input_ids=input_ids,
 
242
  return responses
243
 
244
  def chat(self, tokenizer, pixel_values, question, generation_config, history=None, return_history=False,
245
+ num_patches_list=None, IMG_START_TOKEN='<img>', IMG_END_TOKEN='</img>', IMG_CONTEXT_TOKEN='<IMG_CONTEXT>',
246
+ verbose=False):
247
+
248
+ if history is None and pixel_values is not None and '<image>' not in question:
249
+ question = '<image>\n' + question
250
+
251
+ if num_patches_list is None:
252
+ num_patches_list = [pixel_values.shape[0]] if pixel_values is not None else []
253
+ assert pixel_values is None or len(pixel_values) == sum(num_patches_list)
254
 
255
  img_context_token_id = tokenizer.convert_tokens_to_ids(IMG_CONTEXT_TOKEN)
256
  self.img_context_token_id = img_context_token_id
257
 
 
 
258
  template = get_conv_template(self.template)
259
+ eos_token_id = tokenizer.convert_tokens_to_ids(template.sep)
260
+
261
+ history = [] if history is None else history
262
+ for (old_question, old_answer) in history:
263
+ template.append_message(template.roles[0], old_question)
264
+ template.append_message(template.roles[1], old_answer)
 
 
 
 
265
  template.append_message(template.roles[0], question)
266
  template.append_message(template.roles[1], None)
267
  query = template.get_prompt()
268
+
269
+ if verbose and pixel_values is not None:
270
+ image_bs = pixel_values.shape[0]
271
+ print(f'dynamic ViT batch size: {image_bs}')
272
+
273
+ for num_patches in num_patches_list:
274
+ image_tokens = IMG_START_TOKEN + IMG_CONTEXT_TOKEN * self.num_image_token * num_patches + IMG_END_TOKEN
275
+ query = query.replace('<image>', image_tokens, 1)
276
+
277
  model_inputs = tokenizer(query, return_tensors='pt')
278
  input_ids = model_inputs['input_ids'].cuda()
279
  attention_mask = model_inputs['attention_mask'].cuda()
 
280
  generation_config['eos_token_id'] = eos_token_id
 
281
  generation_output = self.generate(
282
  pixel_values=pixel_values,
283
  input_ids=input_ids,
 
290
  if return_history:
291
  return response, history
292
  else:
293
+ query_to_print = query.replace(IMG_CONTEXT_TOKEN, '')
294
+ query_to_print = query_to_print.replace(f'{IMG_START_TOKEN}{IMG_END_TOKEN}', '<image>')
295
+ if verbose:
296
+ print(query_to_print, response)
297
  return response
 
298
 
299
  @torch.no_grad()
300
  def generate(
special_tokens_map.json CHANGED
@@ -1,68 +1,14 @@
1
  {
2
  "additional_special_tokens": [
3
- {
4
- "content": "<img>",
5
- "lstrip": false,
6
- "normalized": false,
7
- "rstrip": false,
8
- "single_word": false
9
- },
10
- {
11
- "content": "</img>",
12
- "lstrip": false,
13
- "normalized": false,
14
- "rstrip": false,
15
- "single_word": false
16
- },
17
- {
18
- "content": "<IMG_CONTEXT>",
19
- "lstrip": false,
20
- "normalized": false,
21
- "rstrip": false,
22
- "single_word": false
23
- },
24
- {
25
- "content": "<quad>",
26
- "lstrip": false,
27
- "normalized": false,
28
- "rstrip": false,
29
- "single_word": false
30
- },
31
- {
32
- "content": "</quad>",
33
- "lstrip": false,
34
- "normalized": false,
35
- "rstrip": false,
36
- "single_word": false
37
- },
38
- {
39
- "content": "<ref>",
40
- "lstrip": false,
41
- "normalized": false,
42
- "rstrip": false,
43
- "single_word": false
44
- },
45
- {
46
- "content": "</ref>",
47
- "lstrip": false,
48
- "normalized": false,
49
- "rstrip": false,
50
- "single_word": false
51
- },
52
- {
53
- "content": "<box>",
54
- "lstrip": false,
55
- "normalized": false,
56
- "rstrip": false,
57
- "single_word": false
58
- },
59
- {
60
- "content": "</box>",
61
- "lstrip": false,
62
- "normalized": false,
63
- "rstrip": false,
64
- "single_word": false
65
- }
66
  ],
67
  "bos_token": {
68
  "content": "<s>",
 
1
  {
2
  "additional_special_tokens": [
3
+ "<img>",
4
+ "</img>",
5
+ "<IMG_CONTEXT>",
6
+ "<quad>",
7
+ "</quad>",
8
+ "<ref>",
9
+ "</ref>",
10
+ "<box>",
11
+ "</box>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ],
13
  "bos_token": {
14
  "content": "<s>",