update readme
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ datasets:
|
|
12 |
> ——*George R.R. Martin, A Clash of Kings*
|
13 |
|
14 |
|
15 |
-
\[
|
16 |
|
17 |
## Introduction
|
18 |
|
@@ -28,7 +28,7 @@ We release our model weights and provide an example below to run our model . Det
|
|
28 |
|
29 |
**Install dependencies**
|
30 |
```bash
|
31 |
-
pip install transformers # latest version is ok, but we recommend v4.
|
32 |
pip install -q pillow accelerate einops
|
33 |
```
|
34 |
|
@@ -50,7 +50,7 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
50 |
tokenizer = AutoTokenizer.from_pretrained("MILVLG/Imp-v1.5-2B-Qwen1.5", trust_remote_code=True)
|
51 |
|
52 |
#Set inputs
|
53 |
-
text = "
|
54 |
image = Image.open("images/bus.jpg")
|
55 |
|
56 |
input_ids = tokenizer(text, return_tensors='pt').input_ids
|
@@ -71,7 +71,7 @@ We conduct evaluation on 9 commonly-used benchmarks, including 5 academic VQA be
|
|
71 |
| Models | Size | VQAv2 | GQA | SQA(IMG) | TextVQA | POPE | MME(P) | MMB |MMBCN |MM-Vet|
|
72 |
|:--------:|:-----:|:----:|:-------------:|:--------:|:-----:|:----:|:-------:|:-------:|:-------:|:-------:|
|
73 |
| [LLaVA-v1.5-lora](https://huggingface.co/liuhaotian/llava-v1.5-7b) | 7B |79.10 | 63.00| 68.40 |58.20| 86.40 | 1476.9 | 66.10 |- |30.2|
|
74 |
-
| **Imp-v1.5-2B-Qwen1.5** | 3B |
|
75 |
|
76 |
## License
|
77 |
This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.
|
|
|
12 |
> ——*George R.R. Martin, A Clash of Kings*
|
13 |
|
14 |
|
15 |
+
\[[Paper](https://arxiv.org/abs/2405.12107)\] [[Demo](https://xmbot.net/imp/)\] [[Github](https://github.com/MILVLG/imp)\]
|
16 |
|
17 |
## Introduction
|
18 |
|
|
|
28 |
|
29 |
**Install dependencies**
|
30 |
```bash
|
31 |
+
pip install transformers # latest version is ok, but we recommend v4.36.0
|
32 |
pip install -q pillow accelerate einops
|
33 |
```
|
34 |
|
|
|
50 |
tokenizer = AutoTokenizer.from_pretrained("MILVLG/Imp-v1.5-2B-Qwen1.5", trust_remote_code=True)
|
51 |
|
52 |
#Set inputs
|
53 |
+
text = "<|im_start|>system\nA chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.<|im_end|>\n<|im_start|>user\n<image>\nWhat are the colors of the bus in the image?<|im_end|>\n<|im_start|>assistant"
|
54 |
image = Image.open("images/bus.jpg")
|
55 |
|
56 |
input_ids = tokenizer(text, return_tensors='pt').input_ids
|
|
|
71 |
| Models | Size | VQAv2 | GQA | SQA(IMG) | TextVQA | POPE | MME(P) | MMB |MMBCN |MM-Vet|
|
72 |
|:--------:|:-----:|:----:|:-------------:|:--------:|:-----:|:----:|:-------:|:-------:|:-------:|:-------:|
|
73 |
| [LLaVA-v1.5-lora](https://huggingface.co/liuhaotian/llava-v1.5-7b) | 7B |79.10 | 63.00| 68.40 |58.20| 86.40 | 1476.9 | 66.10 |- |30.2|
|
74 |
+
| **Imp-v1.5-2B-Qwen1.5** | 3B | 79.2 | 61.93 | 66.14| 54.52 | 86.74| 1304.8 | 63.83| 61.34 |33.5|
|
75 |
|
76 |
## License
|
77 |
This project is licensed under the Apache License 2.0 - see the [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) file for details.
|