czczup commited on
Commit
24ba6fd
·
verified ·
1 Parent(s): ce1dbef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -19
README.md CHANGED
@@ -65,7 +65,7 @@ To construct this dataset, we propose an efficient data construction pipeline. S
65
 
66
  - **For samples with clear ground truths:**
67
  the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
68
- Responses matching the ground truth answer constitute the positive set \\(mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
69
  Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
70
 
71
  - **For samples without clear ground truths:**
@@ -160,7 +160,7 @@ To comprehensively compare InternVL's performance before and after MPO, we emplo
160
 
161
  ## Quick Start
162
 
163
- We provide an example code to run `InternVL2_5-1B` using `transformers`.
164
 
165
  > Please use transformers>=4.37.2 to ensure the model works normally.
166
 
@@ -171,7 +171,7 @@ We provide an example code to run `InternVL2_5-1B` using `transformers`.
171
  ```python
172
  import torch
173
  from transformers import AutoTokenizer, AutoModel
174
- path = "OpenGVLab/InternVL2_5-1B"
175
  model = AutoModel.from_pretrained(
176
  path,
177
  torch_dtype=torch.bfloat16,
@@ -185,7 +185,7 @@ model = AutoModel.from_pretrained(
185
  ```python
186
  import torch
187
  from transformers import AutoTokenizer, AutoModel
188
- path = "OpenGVLab/InternVL2_5-1B"
189
  model = AutoModel.from_pretrained(
190
  path,
191
  torch_dtype=torch.bfloat16,
@@ -230,8 +230,8 @@ def split_model(model_name):
230
 
231
  return device_map
232
 
233
- path = "OpenGVLab/InternVL2_5-1B"
234
- device_map = split_model('InternVL2_5-1B')
235
  model = AutoModel.from_pretrained(
236
  path,
237
  torch_dtype=torch.bfloat16,
@@ -244,6 +244,7 @@ model = AutoModel.from_pretrained(
244
  ### Inference with Transformers
245
 
246
  ```python
 
247
  import numpy as np
248
  import torch
249
  import torchvision.transforms as T
@@ -326,14 +327,44 @@ def load_image(image_file, input_size=448, max_num=12):
326
  pixel_values = torch.stack(pixel_values)
327
  return pixel_values
328
 
329
- # If you want to load a model using multiple GPUs, please refer to the `Multiple GPUs` section.
330
- path = 'OpenGVLab/InternVL2_5-1B'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
331
  model = AutoModel.from_pretrained(
332
  path,
333
  torch_dtype=torch.bfloat16,
 
334
  low_cpu_mem_usage=True,
335
  use_flash_attn=True,
336
- trust_remote_code=True).eval().cuda()
 
337
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
338
 
339
  # set the max number of tiles in `max_num`
@@ -510,9 +541,9 @@ LMDeploy abstracts the complex inference process of multi-modal Vision-Language
510
  from lmdeploy import pipeline, TurbomindEngineConfig
511
  from lmdeploy.vl import load_image
512
 
513
- model = 'OpenGVLab/InternVL2_5-1B'
514
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
515
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
516
  response = pipe(('describe this image', image))
517
  print(response.text)
518
  ```
@@ -528,8 +559,8 @@ from lmdeploy import pipeline, TurbomindEngineConfig
528
  from lmdeploy.vl import load_image
529
  from lmdeploy.vl.constants import IMAGE_TOKEN
530
 
531
- model = 'OpenGVLab/InternVL2_5-1B'
532
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
533
 
534
  image_urls=[
535
  'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
@@ -550,8 +581,8 @@ Conducting inference with batch prompts is quite straightforward; just place the
550
  from lmdeploy import pipeline, TurbomindEngineConfig
551
  from lmdeploy.vl import load_image
552
 
553
- model = 'OpenGVLab/InternVL2_5-1B'
554
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
555
 
556
  image_urls=[
557
  "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
@@ -570,8 +601,8 @@ There are two ways to do the multi-turn conversations with the pipeline. One is
570
  from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
571
  from lmdeploy.vl import load_image
572
 
573
- model = 'OpenGVLab/InternVL2_5-1B'
574
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
575
 
576
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
577
  gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
@@ -586,7 +617,7 @@ print(sess.response.text)
586
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
587
 
588
  ```shell
589
- lmdeploy serve api_server OpenGVLab/InternVL2_5-1B --server-port 23333
590
  ```
591
 
592
  To use the OpenAI-style interface, you need to install OpenAI:
@@ -625,7 +656,7 @@ print(response)
625
 
626
  ## License
627
 
628
- This project is released under the MIT License. This project uses the pre-trained Qwen2.5-0.5B-Instruct as a component, which is licensed under the Apache License 2.0.
629
 
630
  ## Citation
631
 
 
65
 
66
  - **For samples with clear ground truths:**
67
  the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
68
+ Responses matching the ground truth answer constitute the positive set \\(\mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
69
  Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
70
 
71
  - **For samples without clear ground truths:**
 
160
 
161
  ## Quick Start
162
 
163
+ We provide an example code to run `InternVL2_5-38B-MPO` using `transformers`.
164
 
165
  > Please use transformers>=4.37.2 to ensure the model works normally.
166
 
 
171
  ```python
172
  import torch
173
  from transformers import AutoTokenizer, AutoModel
174
+ path = "OpenGVLab/InternVL2_5-38B-MPO"
175
  model = AutoModel.from_pretrained(
176
  path,
177
  torch_dtype=torch.bfloat16,
 
185
  ```python
186
  import torch
187
  from transformers import AutoTokenizer, AutoModel
188
+ path = "OpenGVLab/InternVL2_5-38B-MPO"
189
  model = AutoModel.from_pretrained(
190
  path,
191
  torch_dtype=torch.bfloat16,
 
230
 
231
  return device_map
232
 
233
+ path = "OpenGVLab/InternVL2_5-38B-MPO"
234
+ device_map = split_model('InternVL2_5-38B')
235
  model = AutoModel.from_pretrained(
236
  path,
237
  torch_dtype=torch.bfloat16,
 
244
  ### Inference with Transformers
245
 
246
  ```python
247
+ import math
248
  import numpy as np
249
  import torch
250
  import torchvision.transforms as T
 
327
  pixel_values = torch.stack(pixel_values)
328
  return pixel_values
329
 
330
+ def split_model(model_name):
331
+ device_map = {}
332
+ world_size = torch.cuda.device_count()
333
+ num_layers = {
334
+ 'InternVL2_5-1B': 24, 'InternVL2_5-2B': 24, 'InternVL2_5-4B': 36, 'InternVL2_5-8B': 32,
335
+ 'InternVL2_5-26B': 48, 'InternVL2_5-38B': 64, 'InternVL2_5-78B': 80}[model_name]
336
+ # Since the first GPU will be used for ViT, treat it as half a GPU.
337
+ num_layers_per_gpu = math.ceil(num_layers / (world_size - 0.5))
338
+ num_layers_per_gpu = [num_layers_per_gpu] * world_size
339
+ num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.5)
340
+ layer_cnt = 0
341
+ for i, num_layer in enumerate(num_layers_per_gpu):
342
+ for j in range(num_layer):
343
+ device_map[f'language_model.model.layers.{layer_cnt}'] = i
344
+ layer_cnt += 1
345
+ device_map['vision_model'] = 0
346
+ device_map['mlp1'] = 0
347
+ device_map['language_model.model.tok_embeddings'] = 0
348
+ device_map['language_model.model.embed_tokens'] = 0
349
+ device_map['language_model.output'] = 0
350
+ device_map['language_model.model.norm'] = 0
351
+ device_map['language_model.lm_head'] = 0
352
+ device_map[f'language_model.model.layers.{num_layers - 1}'] = 0
353
+
354
+ return device_map
355
+
356
+ # If you set `load_in_8bit=True`, you will need one 80GB GPUs.
357
+ # If you set `load_in_8bit=False`, you will need at least two 80GB GPUs.
358
+ path = 'OpenGVLab/InternVL2_5-38B-MPO'
359
+ device_map = split_model('InternVL2_5-38B')
360
  model = AutoModel.from_pretrained(
361
  path,
362
  torch_dtype=torch.bfloat16,
363
+ load_in_8bit=False,
364
  low_cpu_mem_usage=True,
365
  use_flash_attn=True,
366
+ trust_remote_code=True,
367
+ device_map=device_map).eval()
368
  tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
369
 
370
  # set the max number of tiles in `max_num`
 
541
  from lmdeploy import pipeline, TurbomindEngineConfig
542
  from lmdeploy.vl import load_image
543
 
544
+ model = 'OpenGVLab/InternVL2_5-38B-MPO'
545
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
546
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
547
  response = pipe(('describe this image', image))
548
  print(response.text)
549
  ```
 
559
  from lmdeploy.vl import load_image
560
  from lmdeploy.vl.constants import IMAGE_TOKEN
561
 
562
+ model = 'OpenGVLab/InternVL2_5-38B-MPO'
563
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
564
 
565
  image_urls=[
566
  'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
 
581
  from lmdeploy import pipeline, TurbomindEngineConfig
582
  from lmdeploy.vl import load_image
583
 
584
+ model = 'OpenGVLab/InternVL2_5-38B-MPO'
585
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
586
 
587
  image_urls=[
588
  "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
 
601
  from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
602
  from lmdeploy.vl import load_image
603
 
604
+ model = 'OpenGVLab/InternVL2_5-38B-MPO'
605
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
606
 
607
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
608
  gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
 
617
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
618
 
619
  ```shell
620
+ lmdeploy serve api_server OpenGVLab/InternVL2_5-38B-MPO --server-port 23333 --tp 2
621
  ```
622
 
623
  To use the OpenAI-style interface, you need to install OpenAI:
 
656
 
657
  ## License
658
 
659
+ This project is released under the MIT License. This project uses the pre-trained Qwen2.5-32B-Instruct as a component, which is licensed under the Apache License 2.0.
660
 
661
  ## Citation
662