Update README.md
Browse files
README.md
CHANGED
@@ -65,7 +65,7 @@ To construct this dataset, we propose an efficient data construction pipeline. S
|
|
65 |
|
66 |
- **For samples with clear ground truths:**
|
67 |
the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
|
68 |
-
Responses matching the ground truth answer constitute the positive set \\(mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
|
69 |
Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
|
70 |
|
71 |
- **For samples without clear ground truths:**
|
@@ -160,7 +160,7 @@ To comprehensively compare InternVL's performance before and after MPO, we emplo
|
|
160 |
|
161 |
## Quick Start
|
162 |
|
163 |
-
We provide an example code to run `InternVL2_5-
|
164 |
|
165 |
> Please use transformers>=4.37.2 to ensure the model works normally.
|
166 |
|
@@ -171,7 +171,7 @@ We provide an example code to run `InternVL2_5-1B` using `transformers`.
|
|
171 |
```python
|
172 |
import torch
|
173 |
from transformers import AutoTokenizer, AutoModel
|
174 |
-
path = "OpenGVLab/InternVL2_5-
|
175 |
model = AutoModel.from_pretrained(
|
176 |
path,
|
177 |
torch_dtype=torch.bfloat16,
|
@@ -185,7 +185,7 @@ model = AutoModel.from_pretrained(
|
|
185 |
```python
|
186 |
import torch
|
187 |
from transformers import AutoTokenizer, AutoModel
|
188 |
-
path = "OpenGVLab/InternVL2_5-
|
189 |
model = AutoModel.from_pretrained(
|
190 |
path,
|
191 |
torch_dtype=torch.bfloat16,
|
@@ -230,8 +230,8 @@ def split_model(model_name):
|
|
230 |
|
231 |
return device_map
|
232 |
|
233 |
-
path = "OpenGVLab/InternVL2_5-
|
234 |
-
device_map = split_model('InternVL2_5-
|
235 |
model = AutoModel.from_pretrained(
|
236 |
path,
|
237 |
torch_dtype=torch.bfloat16,
|
@@ -244,6 +244,7 @@ model = AutoModel.from_pretrained(
|
|
244 |
### Inference with Transformers
|
245 |
|
246 |
```python
|
|
|
247 |
import numpy as np
|
248 |
import torch
|
249 |
import torchvision.transforms as T
|
@@ -326,14 +327,44 @@ def load_image(image_file, input_size=448, max_num=12):
|
|
326 |
pixel_values = torch.stack(pixel_values)
|
327 |
return pixel_values
|
328 |
|
329 |
-
|
330 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
331 |
model = AutoModel.from_pretrained(
|
332 |
path,
|
333 |
torch_dtype=torch.bfloat16,
|
|
|
334 |
low_cpu_mem_usage=True,
|
335 |
use_flash_attn=True,
|
336 |
-
trust_remote_code=True
|
|
|
337 |
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
|
338 |
|
339 |
# set the max number of tiles in `max_num`
|
@@ -510,9 +541,9 @@ LMDeploy abstracts the complex inference process of multi-modal Vision-Language
|
|
510 |
from lmdeploy import pipeline, TurbomindEngineConfig
|
511 |
from lmdeploy.vl import load_image
|
512 |
|
513 |
-
model = 'OpenGVLab/InternVL2_5-
|
514 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
|
515 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
516 |
response = pipe(('describe this image', image))
|
517 |
print(response.text)
|
518 |
```
|
@@ -528,8 +559,8 @@ from lmdeploy import pipeline, TurbomindEngineConfig
|
|
528 |
from lmdeploy.vl import load_image
|
529 |
from lmdeploy.vl.constants import IMAGE_TOKEN
|
530 |
|
531 |
-
model = 'OpenGVLab/InternVL2_5-
|
532 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
533 |
|
534 |
image_urls=[
|
535 |
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
|
@@ -550,8 +581,8 @@ Conducting inference with batch prompts is quite straightforward; just place the
|
|
550 |
from lmdeploy import pipeline, TurbomindEngineConfig
|
551 |
from lmdeploy.vl import load_image
|
552 |
|
553 |
-
model = 'OpenGVLab/InternVL2_5-
|
554 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
555 |
|
556 |
image_urls=[
|
557 |
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
|
@@ -570,8 +601,8 @@ There are two ways to do the multi-turn conversations with the pipeline. One is
|
|
570 |
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
|
571 |
from lmdeploy.vl import load_image
|
572 |
|
573 |
-
model = 'OpenGVLab/InternVL2_5-
|
574 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
575 |
|
576 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
|
577 |
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
|
@@ -586,7 +617,7 @@ print(sess.response.text)
|
|
586 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
587 |
|
588 |
```shell
|
589 |
-
lmdeploy serve api_server OpenGVLab/InternVL2_5-
|
590 |
```
|
591 |
|
592 |
To use the OpenAI-style interface, you need to install OpenAI:
|
@@ -625,7 +656,7 @@ print(response)
|
|
625 |
|
626 |
## License
|
627 |
|
628 |
-
This project is released under the MIT License. This project uses the pre-trained Qwen2.5-
|
629 |
|
630 |
## Citation
|
631 |
|
|
|
65 |
|
66 |
- **For samples with clear ground truths:**
|
67 |
the model is prompted to first provide the reasoning process and then give the final answer in the format like `Final Answer: ***`.
|
68 |
+
Responses matching the ground truth answer constitute the positive set \\(\mathcal{Y}_p\\), while those that do not match make up the negative set \\(\mathcal{Y}_n\\). Additionally, responses that fail to provide a clear final answer are also merged into \\(\mathcal{Y}_n\\).
|
69 |
Given these responses labeled as positive or negative, we build the preference pairs by selecting a chosen response \\(y_c\\) from \\(\mathcal{Y}_p\\) and a negative response \\(y_r\\) from \\(\mathcal{Y}_n\\).
|
70 |
|
71 |
- **For samples without clear ground truths:**
|
|
|
160 |
|
161 |
## Quick Start
|
162 |
|
163 |
+
We provide an example code to run `InternVL2_5-38B-MPO` using `transformers`.
|
164 |
|
165 |
> Please use transformers>=4.37.2 to ensure the model works normally.
|
166 |
|
|
|
171 |
```python
|
172 |
import torch
|
173 |
from transformers import AutoTokenizer, AutoModel
|
174 |
+
path = "OpenGVLab/InternVL2_5-38B-MPO"
|
175 |
model = AutoModel.from_pretrained(
|
176 |
path,
|
177 |
torch_dtype=torch.bfloat16,
|
|
|
185 |
```python
|
186 |
import torch
|
187 |
from transformers import AutoTokenizer, AutoModel
|
188 |
+
path = "OpenGVLab/InternVL2_5-38B-MPO"
|
189 |
model = AutoModel.from_pretrained(
|
190 |
path,
|
191 |
torch_dtype=torch.bfloat16,
|
|
|
230 |
|
231 |
return device_map
|
232 |
|
233 |
+
path = "OpenGVLab/InternVL2_5-38B-MPO"
|
234 |
+
device_map = split_model('InternVL2_5-38B')
|
235 |
model = AutoModel.from_pretrained(
|
236 |
path,
|
237 |
torch_dtype=torch.bfloat16,
|
|
|
244 |
### Inference with Transformers
|
245 |
|
246 |
```python
|
247 |
+
import math
|
248 |
import numpy as np
|
249 |
import torch
|
250 |
import torchvision.transforms as T
|
|
|
327 |
pixel_values = torch.stack(pixel_values)
|
328 |
return pixel_values
|
329 |
|
330 |
+
def split_model(model_name):
|
331 |
+
device_map = {}
|
332 |
+
world_size = torch.cuda.device_count()
|
333 |
+
num_layers = {
|
334 |
+
'InternVL2_5-1B': 24, 'InternVL2_5-2B': 24, 'InternVL2_5-4B': 36, 'InternVL2_5-8B': 32,
|
335 |
+
'InternVL2_5-26B': 48, 'InternVL2_5-38B': 64, 'InternVL2_5-78B': 80}[model_name]
|
336 |
+
# Since the first GPU will be used for ViT, treat it as half a GPU.
|
337 |
+
num_layers_per_gpu = math.ceil(num_layers / (world_size - 0.5))
|
338 |
+
num_layers_per_gpu = [num_layers_per_gpu] * world_size
|
339 |
+
num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.5)
|
340 |
+
layer_cnt = 0
|
341 |
+
for i, num_layer in enumerate(num_layers_per_gpu):
|
342 |
+
for j in range(num_layer):
|
343 |
+
device_map[f'language_model.model.layers.{layer_cnt}'] = i
|
344 |
+
layer_cnt += 1
|
345 |
+
device_map['vision_model'] = 0
|
346 |
+
device_map['mlp1'] = 0
|
347 |
+
device_map['language_model.model.tok_embeddings'] = 0
|
348 |
+
device_map['language_model.model.embed_tokens'] = 0
|
349 |
+
device_map['language_model.output'] = 0
|
350 |
+
device_map['language_model.model.norm'] = 0
|
351 |
+
device_map['language_model.lm_head'] = 0
|
352 |
+
device_map[f'language_model.model.layers.{num_layers - 1}'] = 0
|
353 |
+
|
354 |
+
return device_map
|
355 |
+
|
356 |
+
# If you set `load_in_8bit=True`, you will need one 80GB GPUs.
|
357 |
+
# If you set `load_in_8bit=False`, you will need at least two 80GB GPUs.
|
358 |
+
path = 'OpenGVLab/InternVL2_5-38B-MPO'
|
359 |
+
device_map = split_model('InternVL2_5-38B')
|
360 |
model = AutoModel.from_pretrained(
|
361 |
path,
|
362 |
torch_dtype=torch.bfloat16,
|
363 |
+
load_in_8bit=False,
|
364 |
low_cpu_mem_usage=True,
|
365 |
use_flash_attn=True,
|
366 |
+
trust_remote_code=True,
|
367 |
+
device_map=device_map).eval()
|
368 |
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
|
369 |
|
370 |
# set the max number of tiles in `max_num`
|
|
|
541 |
from lmdeploy import pipeline, TurbomindEngineConfig
|
542 |
from lmdeploy.vl import load_image
|
543 |
|
544 |
+
model = 'OpenGVLab/InternVL2_5-38B-MPO'
|
545 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
|
546 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
547 |
response = pipe(('describe this image', image))
|
548 |
print(response.text)
|
549 |
```
|
|
|
559 |
from lmdeploy.vl import load_image
|
560 |
from lmdeploy.vl.constants import IMAGE_TOKEN
|
561 |
|
562 |
+
model = 'OpenGVLab/InternVL2_5-38B-MPO'
|
563 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
564 |
|
565 |
image_urls=[
|
566 |
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
|
|
|
581 |
from lmdeploy import pipeline, TurbomindEngineConfig
|
582 |
from lmdeploy.vl import load_image
|
583 |
|
584 |
+
model = 'OpenGVLab/InternVL2_5-38B-MPO'
|
585 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
586 |
|
587 |
image_urls=[
|
588 |
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
|
|
|
601 |
from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
|
602 |
from lmdeploy.vl import load_image
|
603 |
|
604 |
+
model = 'OpenGVLab/InternVL2_5-38B-MPO'
|
605 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
606 |
|
607 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
|
608 |
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
|
|
|
617 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
618 |
|
619 |
```shell
|
620 |
+
lmdeploy serve api_server OpenGVLab/InternVL2_5-38B-MPO --server-port 23333 --tp 2
|
621 |
```
|
622 |
|
623 |
To use the OpenAI-style interface, you need to install OpenAI:
|
|
|
656 |
|
657 |
## License
|
658 |
|
659 |
+
This project is released under the MIT License. This project uses the pre-trained Qwen2.5-32B-Instruct as a component, which is licensed under the Apache License 2.0.
|
660 |
|
661 |
## Citation
|
662 |
|